diff --git a/prompt-papers-old.md b/prompt-papers-old.md
deleted file mode 100644
index 34d4b6e..0000000
--- a/prompt-papers-old.md
+++ /dev/null
@@ -1,7421 +0,0 @@
-# Accepted Papers
-
-## [ICPC: In-context Prompt Compression with Faster Inference](https://arxiv.org/abs/http://arxiv.org/abs/2501.01625v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01625v1
-
-**Abstract:**
-> Despite the recent success of Large Language Models (LLMs), it remains
-> challenging to feed LLMs with long prompts due to the fixed size of LLM inputs.
-> As a remedy, prompt compression becomes a promising solution by removing
-> redundant tokens in the prompt. However, using LLM in the existing works
-> requires additional computation resources and leads to memory overheads. To
-> address it, we propose ICPC (In-context Prompt Compression), a novel and
-> scalable prompt compression method that adaptively reduces the prompt length.
-> The key idea of ICPC is to calculate the probability of each word appearing in
-> the prompt using encoders and calculate information carried by each word
-> through the information function, which effectively reduces the information
-> loss during prompt compression and increases the speed of compression.
-> Empirically, we demonstrate that ICPC can effectively compress long texts of
-> different categories and thus achieve better performance and speed on different
-> types of NLP tasks.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "ACCEPT",
-    "explanation": "The paper meets all 'MUST' criteria: it focuses primarily on prompt engineering for LLMs (in-context prompt compression), investigates a method to improve LLM performance through prompt manipulation (reducing information loss during compression), and likely provides concrete examples (though not explicitly stated in the abstract, implied through 'empirically demonstrate' on various NLP tasks)."
-}
-
----
-
-## [Adaptive Few-shot Prompting for Machine Translation with Pre-trained
-  Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01679v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01679v1
-
-**Abstract:**
-> Recently, Large language models (LLMs) with in-context learning have
-> demonstrated remarkable potential in handling neural machine translation.
-> However, existing evidence shows that LLMs are prompt-sensitive and it is
-> sub-optimal to apply the fixed prompt to any input for downstream machine
-> translation tasks. To address this issue, we propose an adaptive few-shot
-> prompting (AFSP) framework to automatically select suitable translation
-> demonstrations for various source input sentences to further elicit the
-> translation capability of an LLM for better machine translation. First, we
-> build a translation demonstration retrieval module based on LLM's embedding to
-> retrieve top-k semantic-similar translation demonstrations from aligned
-> parallel translation corpus. Rather than using other embedding models for
-> semantic demonstration retrieval, we build a hybrid demonstration retrieval
-> module based on the embedding layer of the deployed LLM to build better input
-> representation for retrieving more semantic-related translation demonstrations.
-> Then, to ensure better semantic consistency between source inputs and target
-> outputs, we force the deployed LLM itself to generate multiple output
-> candidates in the target language with the help of translation demonstrations
-> and rerank these candidates. Besides, to better evaluate the effectiveness of
-> our AFSP framework on the latest language and extend the research boundary of
-> neural machine translation, we construct a high-quality diplomatic
-> Chinese-English parallel dataset that consists of 5,528 parallel
-> Chinese-English sentences. Finally, extensive experiments on the proposed
-> diplomatic Chinese-English parallel dataset and the United Nations Parallel
-> Corpus (Chinese-English part) show the effectiveness and superiority of our
-> proposed AFSP.
-
-**Decision Explanation:**
-Original decision: ACCEPT
-The paper meets all 'MUST' criteria: it focuses on prompt engineering for LLMs (adaptive few-shot prompting for machine translation), investigates improving LLM performance through prompt manipulation, and provides concrete examples with a replicable framework using publicly available LLMs, as evidenced by experiments on constructed and existing datasets.
-
----
-
-## [Table as Thought: Exploring Structured Thoughts in LLM Reasoning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02152v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02152v1
-
-**Abstract:**
-> Large language models' reasoning abilities benefit from methods that organize
-> their thought processes, such as chain-of-thought prompting, which employs a
-> sequential structure to guide the reasoning process step-by-step. However,
-> existing approaches focus primarily on organizing the sequence of thoughts,
-> leaving structure in individual thought steps underexplored. To address this
-> gap, we propose Table as Thought, a framework inspired by cognitive
-> neuroscience theories on human thought. Table as Thought organizes reasoning
-> within a tabular schema, where rows represent sequential thought steps and
-> columns capture critical constraints and contextual information to enhance
-> reasoning. The reasoning process iteratively populates the table until
-> self-verification ensures completeness and correctness. Our experiments show
-> that Table as Thought excels in planning tasks and demonstrates a strong
-> potential for enhancing LLM performance in mathematical reasoning compared to
-> unstructured thought baselines. This work provides a novel exploration of
-> refining thought representation within LLMs, paving the way for advancements in
-> reasoning and AI cognition.
-
-**Decision Explanation:**
-Original decision: ACCEPT
-The paper focuses primarily on prompt engineering for LLMs by proposing a novel 'Table as Thought' framework, which organizes reasoning within a tabular schema to enhance LLM performance. It provides concrete examples (planning and mathematical reasoning tasks) and demonstrates the impact of this structured prompt approach on LLM output, meeting all required criteria.
-
----
-
-## [The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for
-  Test Case Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01329v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01329v1
-
-**Abstract:**
-> Test cases are essential for validating the reliability and quality of
-> software applications. Recent studies have demonstrated the capability of Large
-> Language Models (LLMs) to generate useful test cases for given source code.
-> However, the existing work primarily relies on human-written plain prompts,
-> which often leads to suboptimal results since the performance of LLMs can be
-> highly influenced by the prompts. Moreover, these approaches use the same
-> prompt for all LLMs, overlooking the fact that different LLMs might be best
-> suited to different prompts. Given the wide variety of possible prompt
-> formulations, automatically discovering the optimal prompt for each LLM
-> presents a significant challenge. Although there are methods on automated
-> prompt optimization in the natural language processing field, they are hard to
-> produce effective prompts for the test case generation task. First, the methods
-> iteratively optimize prompts by simply combining and mutating existing ones
-> without proper guidance, resulting in prompts that lack diversity and tend to
-> repeat the same errors in the generated test cases. Second, the prompts are
-> generally lack of domain contextual knowledge, limiting LLMs' performance in
-> the task.
-
-**Decision Explanation:**
-Original decision: ACCEPT
-The paper meets all 'MUST' criteria, focusing primarily on prompt engineering for LLMs, investigating automated optimization of textual input prompts to improve LLM performance in test case generation, and likely providing concrete examples replicable with publicly available LLMs, as implied by its title and abstract.
-
----
-
-## [Recursive Decomposition of Logical Thoughts: Framework for Superior
-  Reasoning and Knowledge Propagation in Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02026v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02026v1
-
-**Abstract:**
-> Enhancing the reasoning capabilities of Large Language Models remains a
-> critical challenge in artificial intelligence. We introduce RDoLT, Recursive
-> Decomposition of Logical Thought prompting, a novel framework that
-> significantly boosts LLM reasoning performance. RDoLT is built on three key
-> innovations: (1) recursively breaking down complex reasoning tasks into
-> sub-tasks of progressive complexity; (2) employing an advanced selection and
-> scoring mechanism to identify the most promising reasoning thoughts; and (3)
-> integrating a knowledge propagation module that mimics human learning by
-> keeping track of strong and weak thoughts for information propagation. Our
-> approach was evaluated across multiple benchmarks, including GSM8K, SVAMP,
-> MultiArith, LastLetterConcatenation, and Gaokao2023 Math. The results
-> demonstrate that RDoLT consistently outperforms existing state-of-the-art
-> techniques, achieving a 90.98 percent accuracy on GSM8K with ChatGPT-4,
-> surpassing state-of-the-art techniques by 6.28 percent. Similar improvements
-> were observed on other benchmarks, with accuracy gains ranging from 5.5 percent
-> to 6.75 percent. These findings highlight RDoLT's potential to advance prompt
-> engineering, offering a more effective and generalizable approach to complex
-> reasoning tasks.
-
-**Decision Explanation:**
-Original decision: ACCEPT
-Meets all 'MUST' criteria: (1) Focuses on prompt engineering for LLMs, (2) Investigates a novel method (RDoLT) to improve LLM performance through prompt manipulation, and (3) Provides concrete examples and benchmark results demonstrating the impact of RDoLT on LLM output, replicable with publicly available LLMs (ChatGPT-4).
-
----
-
-# Rejected Papers
-
-## [Beyond Text: Implementing Multimodal Large Language Model-Powered
-  Multi-Agent Systems Using a No-Code Platform](https://arxiv.org/abs/http://arxiv.org/abs/2501.00750v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00750v1
-
-**Abstract:**
-> This study proposes the design and implementation of a multimodal LLM-based
-> Multi-Agent System (MAS) leveraging a No-Code platform to address the practical
-> constraints and significant entry barriers associated with AI adoption in
-> enterprises. Advanced AI technologies, such as Large Language Models (LLMs),
-> often pose challenges due to their technical complexity and high implementation
-> costs, making them difficult for many organizations to adopt. To overcome these
-> limitations, this research develops a No-Code-based Multi-Agent System designed
-> to enable users without programming knowledge to easily build and manage AI
-> systems. The study examines various use cases to validate the applicability of
-> AI in business processes, including code generation from image-based notes,
-> Advanced RAG-based question-answering systems, text-based image generation, and
-> video generation using images and prompts. These systems lower the barriers to
-> AI adoption, empowering not only professional developers but also general users
-> to harness AI for significantly improved productivity and efficiency. By
-> demonstrating the scalability and accessibility of No-Code platforms, this
-> study advances the democratization of AI technologies within enterprises and
-> validates the practical applicability of Multi-Agent Systems, ultimately
-> contributing to the widespread adoption of AI across various industries.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on implementing a No-Code platform for Multi-Agent Systems using LLMs, not specifically on prompt engineering for text-based interactions with LLMs, failing to meet the core subject requirement. Prompt engineering is mentioned (e.g., using images and prompts for video generation) but is not the central focus.
-
----
-
-## [A3: Android Agent Arena for Mobile GUI Agents](https://arxiv.org/abs/http://arxiv.org/abs/2501.01149v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01149v1
-
-**Abstract:**
-> AI agents have become increasingly prevalent in recent years, driven by
-> significant advancements in the field of large language models (LLMs). Mobile
-> GUI agents, a subset of AI agents, are designed to autonomously perform tasks
-> on mobile devices. While numerous studies have introduced agents, datasets, and
-> benchmarks to advance mobile GUI agent research, many existing datasets focus
-> on static frame evaluations and fail to provide a comprehensive platform for
-> assessing performance on real-world, in-the-wild tasks. To address this gap, we
-> present Android Agent Arena (A3), a novel evaluation platform. Unlike existing
-> in-the-wild systems, A3 offers: (1) meaningful and practical tasks, such as
-> real-time online information retrieval and operational instructions; (2) a
-> larger, more flexible action space, enabling compatibility with agents trained
-> on any dataset; and (3) automated business-level LLM-based evaluation process.
-> A3 includes 21 widely used general third-party apps and 201 tasks
-> representative of common user scenarios, providing a robust foundation for
-> evaluating mobile GUI agents in real-world situations and a new autonomous
-> evaluation process for less human labor and coding expertise. The project is
-> available at \url{https://yuxiangchai.github.io/Android-Agent-Arena/}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a novel evaluation platform (Android Agent Arena) for mobile GUI agents, with LLMs mentioned only as part of the automated evaluation process, not as the central subject for prompt engineering. The core subject does not meet the 'MUST' criteria of focusing primarily on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Rethinking Relation Extraction: Beyond Shortcuts to Generalization with
-  a Debiased Benchmark](https://arxiv.org/abs/http://arxiv.org/abs/2501.01349v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01349v1
-
-**Abstract:**
-> Benchmarks are crucial for evaluating machine learning algorithm performance,
-> facilitating comparison and identifying superior solutions. However, biases
-> within datasets can lead models to learn shortcut patterns, resulting in
-> inaccurate assessments and hindering real-world applicability. This paper
-> addresses the issue of entity bias in relation extraction tasks, where models
-> tend to rely on entity mentions rather than context. We propose a debiased
-> relation extraction benchmark DREB that breaks the pseudo-correlation between
-> entity mentions and relation types through entity replacement. DREB utilizes
-> Bias Evaluator and PPL Evaluator to ensure low bias and high naturalness,
-> providing a reliable and accurate assessment of model generalization in entity
-> bias scenarios. To establish a new baseline on DREB, we introduce MixDebias, a
-> debiasing method combining data-level and model training-level techniques.
-> MixDebias effectively improves model performance on DREB while maintaining
-> performance on the original dataset. Extensive experiments demonstrate the
-> effectiveness and robustness of MixDebias compared to existing methods,
-> highlighting its potential for improving the generalization ability of relation
-> extraction models. We will release DREB and MixDebias publicly.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on debiasing relation extraction benchmarks and proposing a new debiasing method (MixDebias), rather than primarily investigating prompt engineering, optimizing textual input prompts for Large Language Models (LLMs), or demonstrating the impact of prompts on LLM output.
-
----
-
-## [ASKCOS: an open source software suite for synthesis planning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01835v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01835v1
-
-**Abstract:**
-> The advancement of machine learning and the availability of large-scale
-> reaction datasets have accelerated the development of data-driven models for
-> computer-aided synthesis planning (CASP) in the past decade. Here, we detail
-> the newest version of ASKCOS, an open source software suite for synthesis
-> planning that makes available several research advances in a freely available,
-> practical tool. Four one-step retrosynthesis models form the basis of both
-> interactive planning and automatic planning modes. Retrosynthetic planning is
-> complemented by other modules for feasibility assessment and pathway
-> evaluation, including reaction condition recommendation, reaction outcome
-> prediction, and auxiliary capabilities such as solubility prediction and
-> quantum mechanical descriptor prediction. ASKCOS has assisted hundreds of
-> medicinal, synthetic, and process chemists in their day-to-day tasks,
-> complementing expert decision making. It is our belief that CASP tools like
-> ASKCOS are an important part of modern chemistry research, and that they offer
-> ever-increasing utility and accessibility.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of an open source software suite for synthesis planning in chemistry, with no evident focus on prompt engineering for Large Language Models (LLMs) or improving LLM performance through textual input prompt manipulation.
-
----
-
-## [eRevise+RF: A Writing Evaluation System for Assessing Student Essay
-  Revisions and Providing Formative Feedback](https://arxiv.org/abs/http://arxiv.org/abs/2501.00715v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00715v1
-
-**Abstract:**
-> The ability to revise essays in response to feedback is important for
-> students' writing success. An automated writing evaluation (AWE) system that
-> supports students in revising their essays is thus essential. We present
-> eRevise+RF, an enhanced AWE system for assessing student essay revisions (e.g.,
-> changes made to an essay to improve its quality in response to essay feedback)
-> and providing revision feedback. We deployed the system with 6 teachers and 406
-> students across 3 schools in Pennsylvania and Louisiana. The results confirmed
-> its effectiveness in (1) assessing student essays in terms of evidence usage,
-> (2) extracting evidence and reasoning revisions across essays, and (3)
-> determining revision success in responding to feedback. The evaluation also
-> suggested eRevise+RF is a helpful system for young students to improve their
-> argumentative writing skills through revision and formative feedback.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development and evaluation of an automated writing evaluation (AWE) system for assessing student essay revisions, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not demonstrate the impact of textual input prompts on LLM output.
-
----
-
-## [Towards End-to-End Neuromorphic Voxel-based 3D Object Reconstruction
-  Without Physical Priors](https://arxiv.org/abs/http://arxiv.org/abs/2501.00741v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00741v1
-
-**Abstract:**
-> Neuromorphic cameras, also known as event cameras, are asynchronous
-> brightness-change sensors that can capture extremely fast motion without
-> suffering from motion blur, making them particularly promising for 3D
-> reconstruction in extreme environments. However, existing research on 3D
-> reconstruction using monocular neuromorphic cameras is limited, and most of the
-> methods rely on estimating physical priors and employ complex multi-step
-> pipelines. In this work, we propose an end-to-end method for dense voxel 3D
-> reconstruction using neuromorphic cameras that eliminates the need to estimate
-> physical priors. Our method incorporates a novel event representation to
-> enhance edge features, enabling the proposed feature-enhancement model to learn
-> more effectively. Additionally, we introduced Optimal Binarization Threshold
-> Selection Principle as a guideline for future related work, using the optimal
-> reconstruction results achieved with threshold optimization as the benchmark.
-> Our method achieves a 54.6% improvement in reconstruction accuracy compared to
-> the baseline method.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on 3D object reconstruction using neuromorphic cameras, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria."
-}
-
----
-
-## [AttriReBoost: A Gradient-Free Propagation Optimization Method for Cold
-  Start Mitigation in Attribute Missing Graphs](https://arxiv.org/abs/http://arxiv.org/abs/2501.00743v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00743v1
-
-**Abstract:**
-> Missing attribute issues are prevalent in the graph learning, leading to
-> biased outcomes in Graph Neural Networks (GNNs). Existing methods that rely on
-> feature propagation are prone to cold start problem, particularly when dealing
-> with attribute resetting and low-degree nodes, which hinder effective
-> propagation and convergence. To address these challenges, we propose
-> AttriReBoost (ARB), a novel method that incorporates propagation-based method
-> to mitigate cold start problems in attribute-missing graphs. ARB enhances
-> global feature propagation by redefining initial boundary conditions and
-> strategically integrating virtual edges, thereby improving node connectivity
-> and ensuring more stable and efficient convergence. This method facilitates
-> gradient-free attribute reconstruction with lower computational overhead. The
-> proposed method is theoretically grounded, with its convergence rigorously
-> established. Extensive experiments on several real-world benchmark datasets
-> demonstrate the effectiveness of ARB, achieving an average accuracy improvement
-> of 5.11% over state-of-the-art methods. Additionally, ARB exhibits remarkable
-> computational efficiency, processing a large-scale graph with 2.49 million
-> nodes in just 16 seconds on a single GPU. Our code is available at
-> https://github.com/limengran98/ARB.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on mitigating cold start problems in attribute-missing graphs for Graph Neural Networks (GNNs), which does not meet the 'MUST' criteria of primarily focusing on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts for improving LLM performance.
-
----
-
-## [Enhancing Transformers for Generalizable First-Order Logical Entailment](https://arxiv.org/abs/http://arxiv.org/abs/2501.00759v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00759v1
-
-**Abstract:**
-> Transformers, as a fundamental deep learning architecture, have demonstrated
-> remarkable capabilities in reasoning. This paper investigates the generalizable
-> first-order logical reasoning ability of transformers with their parameterized
-> knowledge and explores ways to improve it. The first-order reasoning capability
-> of transformers is assessed through their ability to perform first-order
-> logical entailment, which is quantitatively measured by their performance in
-> answering knowledge graph queries. We establish connections between (1) two
-> types of distribution shifts studied in out-of-distribution generalization and
-> (2) the unseen knowledge and query settings discussed in the task of knowledge
-> graph query answering, enabling a characterization of fine-grained
-> generalizability. Results on our comprehensive dataset show that transformers
-> outperform previous methods specifically designed for this task and provide
-> detailed empirical evidence on the impact of input query syntax, token
-> embedding, and transformer architectures on the reasoning capability of
-> transformers. Interestingly, our findings reveal a mismatch between positional
-> encoding and other design choices in transformer architectures employed in
-> prior practices. This discovery motivates us to propose a more sophisticated,
-> logic-aware architecture, TEGA, to enhance the capability for generalizable
-> first-order logical entailment in transformers.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on enhancing transformer architectures for first-order logical entailment, which aligns with the development of new LLM architectures, rather than primarily focusing on prompt engineering for text-based interactions with existing LLMs.
-
----
-
-## [REM: A Scalable Reinforced Multi-Expert Framework for Multiplex
-  Influence Maximization](https://arxiv.org/abs/http://arxiv.org/abs/2501.00779v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00779v1
-
-**Abstract:**
-> In social online platforms, identifying influential seed users to maximize
-> influence spread is a crucial as it can greatly diminish the cost and efforts
-> required for information dissemination. While effective, traditional methods
-> for Multiplex Influence Maximization (MIM) have reached their performance
-> limits, prompting the emergence of learning-based approaches. These novel
-> methods aim for better generalization and scalability for more sizable graphs
-> but face significant challenges, such as (1) inability to handle unknown
-> diffusion patterns and (2) reliance on high-quality training samples. To
-> address these issues, we propose the Reinforced Expert Maximization framework
-> (REM). REM leverages a Propagation Mixture of Experts technique to encode
-> dynamic propagation of large multiplex networks effectively in order to
-> generate enhanced influence propagation. Noticeably, REM treats a generative
-> model as a policy to autonomously generate different seed sets and learn how to
-> improve them from a Reinforcement Learning perspective. Extensive experiments
-> on several real-world datasets demonstrate that REM surpasses state-of-the-art
-> methods in terms of influence spread, scalability, and inference time in
-> influence maximization tasks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the criteria as it primarily focuses on developing a novel framework (REM) for Multiplex Influence Maximization, leveraging Reinforcement Learning and a generative model, but does not concentrate on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or demonstrate the impact of textual input prompts on LLM output.
-
----
-
-## [Make Shuffling Great Again: A Side-Channel Resistant Fisher-Yates
-  Algorithm for Protecting Neural Networks](https://arxiv.org/abs/http://arxiv.org/abs/2501.00798v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00798v1
-
-**Abstract:**
-> Neural network models implemented in embedded devices have been shown to be
-> susceptible to side-channel attacks (SCAs), allowing recovery of proprietary
-> model parameters, such as weights and biases. There are already available
-> countermeasure methods currently used for protecting cryptographic
-> implementations that can be tailored to protect embedded neural network models.
-> Shuffling, a hiding-based countermeasure that randomly shuffles the order of
-> computations, was shown to be vulnerable to SCA when the Fisher-Yates algorithm
-> is used. In this paper, we propose a design of an SCA-secure version of the
-> Fisher-Yates algorithm. By integrating the masking technique for modular
-> reduction and Blakely's method for modular multiplication, we effectively
-> remove the vulnerability in the division operation that led to side-channel
-> leakage in the original version of the algorithm. We experimentally evaluate
-> that the countermeasure is effective against SCA by implementing a correlation
-> power analysis attack on an embedded neural network model implemented on ARM
-> Cortex-M4. Compared to the original proposal, the memory overhead is $2\times$
-> the biggest layer of the network, while the time overhead varies from $4\%$ to
-> $0.49\%$ for a layer with $100$ and $1000$ neurons, respectively.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on protecting neural networks from side-channel attacks by modifying the Fisher-Yates algorithm, which does not meet the 'MUST' criteria: it does not primarily focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Reasoning-Oriented and Analogy-Based Methods for Locating and Editing in
-  Zero-Shot Event-Relational Reasoning](https://arxiv.org/abs/http://arxiv.org/abs/2501.00803v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00803v1
-
-**Abstract:**
-> Zero-shot event-relational reasoning is an important task in natural language
-> processing, and existing methods jointly learn a variety of event-relational
-> prefixes and inference-form prefixes to achieve such tasks. However, training
-> prefixes consumes large computational resources and lacks interpretability.
-> Additionally, learning various relational and inferential knowledge
-> inefficiently exploits the connections between tasks. Therefore, we first
-> propose a method for Reasoning-Oriented Locating and Editing (ROLE), which
-> locates and edits the key modules of the language model for reasoning about
-> event relations, enhancing interpretability and also resource-efficiently
-> optimizing the reasoning ability. Subsequently, we propose a method for
-> Analogy-Based Locating and Editing (ABLE), which efficiently exploits the
-> similarities and differences between tasks to optimize the zero-shot reasoning
-> capability. Experimental results show that ROLE improves interpretability and
-> reasoning performance with reduced computational cost. ABLE achieves SOTA
-> results in zero-shot reasoning.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on optimizing the language model's reasoning ability through module editing and analogy-based methods, rather than engineering or manipulating textual input prompts to improve Large Language Model (LLM) performance, thus not meeting the core subject requirement of prompt engineering for text-based LLM interactions.
-
----
-
-## [LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management](https://arxiv.org/abs/http://arxiv.org/abs/2501.00826v2)
-**arXiv ID:** http://arxiv.org/abs/2501.00826v2
-
-**Abstract:**
-> Cryptocurrency investment is inherently difficult due to its shorter history
-> compared to traditional assets, the need to integrate vast amounts of data from
-> various modalities, and the requirement for complex reasoning. While deep
-> learning approaches have been applied to address these challenges, their
-> black-box nature raises concerns about trust and explainability. Recently,
-> large language models (LLMs) have shown promise in financial applications due
-> to their ability to understand multi-modal data and generate explainable
-> decisions. However, single LLM faces limitations in complex, comprehensive
-> tasks such as asset investment. These limitations are even more pronounced in
-> cryptocurrency investment, where LLMs have less domain-specific knowledge in
-> their training corpora.
-> To overcome these challenges, we propose an explainable, multi-modal,
-> multi-agent framework for cryptocurrency investment. Our framework uses
-> specialized agents that collaborate within and across teams to handle subtasks
-> such as data analysis, literature integration, and investment decision-making
-> for the top 30 cryptocurrencies by market capitalization. The expert training
-> module fine-tunes agents using multi-modal historical data and professional
-> investment literature, while the multi-agent investment module employs
-> real-time data to make informed cryptocurrency investment decisions. Unique
-> intrateam and interteam collaboration mechanisms enhance prediction accuracy by
-> adjusting final predictions based on confidence levels within agent teams and
-> facilitating information sharing between teams. Empirical evaluation using data
-> from November 2023 to September 2024 demonstrates that our framework
-> outperforms single-agent models and market benchmarks in classification, asset
-> pricing, portfolio, and explainability performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing a multi-agent system for automated crypto portfolio management using LLMs, rather than specifically engineering or optimizing prompts for LLMs, thus not meeting the core subject requirement.
-
----
-
-## [Embedding Style Beyond Topics: Analyzing Dispersion Effects Across
-  Different Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.00828v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00828v1
-
-**Abstract:**
-> This paper analyzes how writing style affects the dispersion of embedding
-> vectors across multiple, state-of-the-art language models. While early
-> transformer models primarily aligned with topic modeling, this study examines
-> the role of writing style in shaping embedding spaces. Using a literary corpus
-> that alternates between topics and styles, we compare the sensitivity of
-> language models across French and English. By analyzing the particular impact
-> of style on embedding dispersion, we aim to better understand how language
-> models process stylistic information, contributing to their overall
-> interpretability.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on analyzing the impact of writing style on embedding vector dispersion across language models, rather than specifically investigating prompt engineering techniques for improving Large Language Model (LLM) performance through textual input prompt manipulation."
-}
-
----
-
-## [An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component
-  Deep Learning Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.00829v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00829v1
-
-**Abstract:**
-> Multi-objective evolutionary algorithms (MOEAs) are widely used for searching
-> optimal solutions in complex multi-component applications. Traditional MOEAs
-> for multi-component deep learning (MCDL) systems face challenges in enhancing
-> the search efficiency while maintaining the diversity. To combat these, this
-> paper proposes $\mu$MOEA, the first LLM-empowered adaptive evolutionary search
-> algorithm to detect safety violations in MCDL systems. Inspired by the
-> context-understanding ability of Large Language Models (LLMs), $\mu$MOEA
-> promotes the LLM to comprehend the optimization problem and generate an initial
-> population tailed to evolutionary objectives. Subsequently, it employs adaptive
-> selection and variation to iteratively produce offspring, balancing the
-> evolutionary efficiency and diversity. During the evolutionary process, to
-> navigate away from the local optima, $\mu$MOEA integrates the evolutionary
-> experience back into the LLM. This utilization harnesses the LLM's quantitative
-> reasoning prowess to generate differential seeds, breaking away from current
-> optimal solutions. We evaluate $\mu$MOEA in finding safety violations of MCDL
-> systems, and compare its performance with state-of-the-art MOEA methods.
-> Experimental results show that $\mu$MOEA can significantly improve the
-> efficiency and diversity of the evolutionary search.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing an adaptive evolutionary algorithm for multi-component deep learning systems, utilizing LLMs as a component, rather than focusing specifically on prompt engineering for Large Language Models. Prompt engineering is not the central concern, but rather a means to enhance the algorithm's performance.
-
----
-
-## [LLM+AL: Bridging Large Language Models and Action Languages for Complex
-  Reasoning about Actions](https://arxiv.org/abs/http://arxiv.org/abs/2501.00830v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00830v1
-
-**Abstract:**
-> Large Language Models (LLMs) have made significant strides in various
-> intelligent tasks but still struggle with complex action reasoning tasks that
-> require systematic search. To address this limitation, we propose a method that
-> bridges the natural language understanding capabilities of LLMs with the
-> symbolic reasoning strengths of action languages. Our approach, termed
-> "LLM+AL," leverages the LLM's strengths in semantic parsing and commonsense
-> knowledge generation alongside the action language's proficiency in automated
-> reasoning based on encoded knowledge. We compare LLM+AL against
-> state-of-the-art LLMs, including ChatGPT-4, Claude 3 Opus, Gemini Ultra 1.0,
-> and o1-preview, using benchmarks for complex reasoning about actions. Our
-> findings indicate that, although all methods exhibit errors, LLM+AL, with
-> relatively minimal human corrections, consistently leads to correct answers,
-> whereas standalone LLMs fail to improve even with human feedback. LLM+AL also
-> contributes to automated generation of action languages.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on integrating LLMs with Action Languages for complex reasoning, rather than prompt engineering specifically for LLMs, and does not provide concrete examples of textual input prompts and their impact on LLM output."
-}
-
----
-
-## [Distilled Lifelong Self-Adaptation for Configurable Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.00840v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00840v1
-
-**Abstract:**
-> Modern configurable systems provide tremendous opportunities for engineering
-> future intelligent software systems. A key difficulty thereof is how to
-> effectively self-adapt the configuration of a running system such that its
-> performance (e.g., runtime and throughput) can be optimized under time-varying
-> workloads. This unfortunately remains unaddressed in existing approaches as
-> they either overlook the available past knowledge or rely on static
-> exploitation of past knowledge without reasoning the usefulness of information
-> when planning for self-adaptation. In this paper, we tackle this challenging
-> problem by proposing DLiSA, a framework that self-adapts configurable systems.
-> DLiSA comes with two properties: firstly, it supports lifelong planning, and
-> thereby the planning process runs continuously throughout the lifetime of the
-> system, allowing dynamic exploitation of the accumulated knowledge for rapid
-> adaptation. Secondly, the planning for a newly emerged workload is boosted via
-> distilled knowledge seeding, in which the knowledge is dynamically purified
-> such that only useful past configurations are seeded when necessary, mitigating
-> misleading information. Extensive experiments suggest that the proposed DLiSA
-> significantly outperforms state-of-the-art approaches, demonstrating a
-> performance improvement of up to 229% and a resource acceleration of up to
-> 2.22x on generating promising adaptation configurations. All data and sources
-> can be found at our repository: https://github.com/ideas-labo/dlisa.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the primary focus criteria, as it concentrates on self-adaptation for configurable systems' performance optimization, lacking any mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, and instead focuses on system configuration and workload adaptation."
-}
-
----
-
-## [Diversity Optimization for Travelling Salesman Problem via Deep
-  Reinforcement Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.00884v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00884v1
-
-**Abstract:**
-> Existing neural methods for the Travelling Salesman Problem (TSP) mostly aim
-> at finding a single optimal solution. To discover diverse yet high-quality
-> solutions for Multi-Solution TSP (MSTSP), we propose a novel deep reinforcement
-> learning based neural solver, which is primarily featured by an encoder-decoder
-> structured policy. Concretely, on the one hand, a Relativization Filter (RF) is
-> designed to enhance the robustness of the encoder to affine transformations of
-> the instances, so as to potentially improve the quality of the found solutions.
-> On the other hand, a Multi-Attentive Adaptive Active Search (MA3S) is tailored
-> to allow the decoders to strike a balance between the optimality and diversity.
-> Experimental evaluations on benchmark instances demonstrate the superiority of
-> our method over recent neural baselines across different metrics, and its
-> competitive performance against state-of-the-art traditional heuristics with
-> significantly reduced computational time, ranging from $1.3\times$ to
-> $15\times$ faster. Furthermore, we demonstrate that our method can also be
-> applied to the Capacitated Vehicle Routing Problem (CVRP).
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not focus on prompt engineering for Large Language Models (LLMs) as required. Instead, it explores deep reinforcement learning for solving the Travelling Salesman Problem and Capacitated Vehicle Routing Problem, which does not meet the primary criteria of prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-## [Population Aware Diffusion for Time Series Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.00910v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00910v1
-
-**Abstract:**
-> Diffusion models have shown promising ability in generating high-quality time
-> series (TS) data. Despite the initial success, existing works mostly focus on
-> the authenticity of data at the individual level, but pay less attention to
-> preserving the population-level properties on the entire dataset. Such
-> population-level properties include value distributions for each dimension and
-> distributions of certain functional dependencies (e.g., cross-correlation, CC)
-> between different dimensions. For instance, when generating house energy
-> consumption TS data, the value distributions of the outside temperature and the
-> kitchen temperature should be preserved, as well as the distribution of CC
-> between them. Preserving such TS population-level properties is critical in
-> maintaining the statistical insights of the datasets, mitigating model bias,
-> and augmenting downstream tasks like TS prediction. Yet, it is often overlooked
-> by existing models. Hence, data generated by existing models often bear
-> distribution shifts from the original data. We propose Population-aware
-> Diffusion for Time Series (PaD-TS), a new TS generation model that better
-> preserves the population-level properties. The key novelties of PaD-TS include
-> 1) a new training method explicitly incorporating TS population-level property
-> preservation, and 2) a new dual-channel encoder model architecture that better
-> captures the TS data structure. Empirical results in major benchmark datasets
-> show that PaD-TS can improve the average CC distribution shift score between
-> real and synthetic data by 5.9x while maintaining a performance comparable to
-> state-of-the-art models on individual-level authenticity.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on developing a new diffusion model for time series generation, emphasizing population-level property preservation, without any mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria."
-}
-
----
-
-## [$β$-DQN: Improving Deep Q-Learning By Evolving the Behavior](https://arxiv.org/abs/http://arxiv.org/abs/2501.00913v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00913v1
-
-**Abstract:**
-> While many sophisticated exploration methods have been proposed, their lack
-> of generality and high computational cost often lead researchers to favor
-> simpler methods like $\epsilon$-greedy. Motivated by this, we introduce
-> $\beta$-DQN, a simple and efficient exploration method that augments the
-> standard DQN with a behavior function $\beta$. This function estimates the
-> probability that each action has been taken at each state. By leveraging
-> $\beta$, we generate a population of diverse policies that balance exploration
-> between state-action coverage and overestimation bias correction. An adaptive
-> meta-controller is designed to select an effective policy for each episode,
-> enabling flexible and explainable exploration. $\beta$-DQN is straightforward
-> to implement and adds minimal computational overhead to the standard DQN.
-> Experiments on both simple and challenging exploration domains show that
-> $\beta$-DQN outperforms existing baseline methods across a wide range of tasks,
-> providing an effective solution for improving exploration in deep reinforcement
-> learning.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on improving Deep Q-Learning with a new exploration method, not on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Incremental Dialogue Management: Survey, Discussion, and Implications
-  for HRI](https://arxiv.org/abs/http://arxiv.org/abs/2501.00953v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00953v1
-
-**Abstract:**
-> Efforts towards endowing robots with the ability to speak have benefited from
-> recent advancements in NLP, in particular large language models. However, as
-> powerful as current models have become, they still operate on sentence or
-> multi-sentence level input, not on the word-by-word input that humans operate
-> on, affecting the degree of responsiveness that they offer, which is critical
-> in situations where humans interact with robots using speech. In this paper, we
-> review the literature on interactive systems that operate incrementally (i.e.,
-> at the word level or below it). We motivate the need for incremental systems,
-> survey incremental modeling of important aspects of dialogue like speech
-> recognition and language generation. Primary focus is on the part of the system
-> that makes decisions, known as the dialogue manager. We find that there is very
-> little research on incremental dialogue management, offer some requirements for
-> practical incremental dialogue management, and the implications of incremental
-> dialogue for embodied, robotic platforms.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on incremental dialogue management for Human-Robot Interaction (HRI), not specifically on the engineering, design, or optimization of prompts for Large Language Models (LLMs), failing to meet the core subject requirement.
-
----
-
-## [Are LLMs effective psychological assessors? Leveraging adaptive RAG for
-  interpretable mental health screening through psychometric practice](https://arxiv.org/abs/http://arxiv.org/abs/2501.00982v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00982v1
-
-**Abstract:**
-> In psychological practice, standardized questionnaires serve as essential
-> tools for assessing mental constructs (e.g., attitudes, traits, and emotions)
-> through structured questions (aka items). With the increasing prevalence of
-> social media platforms where users share personal experiences and emotions,
-> researchers are exploring computational methods to leverage this data for rapid
-> mental health screening. In this study, we propose a novel adaptive
-> Retrieval-Augmented Generation (RAG) approach that completes psychological
-> questionnaires by analyzing social media posts. Our method retrieves the most
-> relevant user posts for each question in a psychological survey and uses Large
-> Language Models (LLMs) to predict questionnaire scores in a zero-shot setting.
-> Our findings are twofold. First we demonstrate that this approach can
-> effectively predict users' responses to psychological questionnaires, such as
-> the Beck Depression Inventory II (BDI-II), achieving performance comparable to
-> or surpassing state-of-the-art models on Reddit-based benchmark datasets
-> without relying on training data. Second, we show how this methodology can be
-> generalized as a scalable screening tool, as the final assessment is
-> systematically derived by completing standardized questionnaires and tracking
-> how individual item responses contribute to the diagnosis, aligning with
-> established psychometric practices.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is leveraging LLMs for mental health screening via social media analysis, rather than prompt engineering for text-based interactions with LLMs. Prompt manipulation is not the central concern, but rather a means to achieve psychological assessment."
-}
-
----
-
-## [Bootstrapped Reward Shaping](https://arxiv.org/abs/http://arxiv.org/abs/2501.00989v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00989v1
-
-**Abstract:**
-> In reinforcement learning, especially in sparse-reward domains, many
-> environment steps are required to observe reward information. In order to
-> increase the frequency of such observations, "potential-based reward shaping"
-> (PBRS) has been proposed as a method of providing a more dense reward signal
-> while leaving the optimal policy invariant. However, the required "potential
-> function" must be carefully designed with task-dependent knowledge to not deter
-> training performance. In this work, we propose a "bootstrapped" method of
-> reward shaping, termed BSRS, in which the agent's current estimate of the
-> state-value function acts as the potential function for PBRS. We provide
-> convergence proofs for the tabular setting, give insights into training
-> dynamics for deep RL, and show that the proposed method improves training speed
-> in the Atari suite.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on reinforcement learning, proposing a method for reward shaping in sparse-reward domains, and does not meet any of the 'MUST' criteria, particularly lacking focus on prompt engineering, design, or optimization for Large Language Models (LLMs), and instead pertains to a different area of AI research.
-
----
-
-## [Exploring Information Processing in Large Language Models: Insights from
-  Information Bottleneck Theory](https://arxiv.org/abs/http://arxiv.org/abs/2501.00999v2)
-**arXiv ID:** http://arxiv.org/abs/2501.00999v2
-
-**Abstract:**
-> Large Language Models (LLMs) have demonstrated remarkable performance across
-> a wide range of tasks by understanding input information and predicting
-> corresponding outputs. However, the internal mechanisms by which LLMs
-> comprehend input and make effective predictions remain poorly understood. In
-> this paper, we explore the working mechanism of LLMs in information processing
-> from the perspective of Information Bottleneck Theory. We propose a
-> non-training construction strategy to define a task space and identify the
-> following key findings: (1) LLMs compress input information into specific task
-> spaces (e.g., sentiment space, topic space) to facilitate task understanding;
-> (2) they then extract and utilize relevant information from the task space at
-> critical moments to generate accurate predictions. Based on these insights, we
-> introduce two novel approaches: an Information Compression-based Context
-> Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances
-> reasoning performance and inference efficiency by compressing retrieved example
-> information into the task space. TS-FT employs a space-guided loss to fine-tune
-> LLMs, encouraging the learning of more effective compression and selection
-> mechanisms. Experiments across multiple datasets validate the effectiveness of
-> task space construction. Additionally, IC-ICL not only improves performance but
-> also accelerates inference speed by over 40\%, while TS-FT achieves superior
-> results with a minimal strategy adjustment.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on understanding LLM's internal information processing mechanisms through Information Bottleneck Theory and proposes methods for fine-tuning (TS-FT) and context learning (IC-ICL), which do not meet the 'MUST' criteria of focusing on the engineering, design, or optimization of prompts specifically for LLMs, nor does it provide concrete examples of prompts and their impact on LLM output.
-
----
-
-## [Deep Reinforcement Learning for Job Scheduling and Resource Management
-  in Cloud Computing: An Algorithm-Level Review](https://arxiv.org/abs/http://arxiv.org/abs/2501.01007v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01007v1
-
-**Abstract:**
-> Cloud computing has revolutionized the provisioning of computing resources,
-> offering scalable, flexible, and on-demand services to meet the diverse
-> requirements of modern applications. At the heart of efficient cloud operations
-> are job scheduling and resource management, which are critical for optimizing
-> system performance and ensuring timely and cost-effective service delivery.
-> However, the dynamic and heterogeneous nature of cloud environments presents
-> significant challenges for these tasks, as workloads and resource availability
-> can fluctuate unpredictably. Traditional approaches, including heuristic and
-> meta-heuristic algorithms, often struggle to adapt to these real-time changes
-> due to their reliance on static models or predefined rules. Deep Reinforcement
-> Learning (DRL) has emerged as a promising solution to these challenges by
-> enabling systems to learn and adapt policies based on continuous observations
-> of the environment, facilitating intelligent and responsive decision-making.
-> This survey provides a comprehensive review of DRL-based algorithms for job
-> scheduling and resource management in cloud computing, analyzing their
-> methodologies, performance metrics, and practical applications. We also
-> highlight emerging trends and future research directions, offering valuable
-> insights into leveraging DRL to advance both job scheduling and resource
-> management in cloud computing.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on Deep Reinforcement Learning for job scheduling and resource management in cloud computing, with no mention of Large Language Models (LLMs), prompt engineering, or textual input manipulation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based
-  on Large language Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.01014v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01014v1
-
-**Abstract:**
-> The exponential growth of data and advancements in big data technologies have
-> created a demand for more efficient and automated approaches to data analysis
-> and storytelling. However, automated data analysis systems still face
-> challenges in leveraging large language models (LLMs) for data insight
-> discovery, augmented analysis, and data storytelling. This paper introduces the
-> Multidimensional Data Storytelling Framework (MDSF) based on large language
-> models for automated insight generation and context-aware storytelling. The
-> framework incorporates advanced preprocessing techniques, augmented analysis
-> algorithms, and a unique scoring mechanism to identify and prioritize
-> actionable insights. The use of fine-tuned LLMs enhances contextual
-> understanding and generates narratives with minimal manual intervention. The
-> architecture also includes an agent-based mechanism for real-time storytelling
-> continuation control. Key findings reveal that MDSF outperforms existing
-> methods across various datasets in terms of insight ranking accuracy,
-> descriptive quality, and narrative coherence. The experimental evaluation
-> demonstrates MDSF's ability to automate complex analytical tasks, reduce
-> interpretive biases, and improve user satisfaction. User studies further
-> underscore its practical utility in enhancing content structure, conclusion
-> extraction, and richness of detail.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on developing a multidimensional data storytelling framework using LLMs as a component, rather than centered on prompt engineering for LLMs. The core subject is the framework's architecture and performance, not novel prompt engineering techniques or systematic prompt development for text-based interactions with LLMs."
-}
-
----
-
-## [Towards Adversarially Robust Deep Metric Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01025v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01025v2
-
-**Abstract:**
-> Deep Metric Learning (DML) has shown remarkable successes in many domains by
-> taking advantage of powerful deep neural networks. Deep neural networks are
-> prone to adversarial attacks and could be easily fooled by adversarial
-> examples. The current progress on this robustness issue is mainly about deep
-> classification models but pays little attention to DML models. Existing works
-> fail to thoroughly inspect the robustness of DML and neglect an important DML
-> scenario, the clustering-based inference. In this work, we first point out the
-> robustness issue of DML models in clustering-based inference scenarios. We find
-> that, for the clustering-based inference, existing defenses designed DML are
-> unable to be reused and the adaptions of defenses designed for deep
-> classification models cannot achieve satisfactory robustness performance. To
-> alleviate the hazard of adversarial examples, we propose a new defense, the
-> Ensemble Adversarial Training (EAT), which exploits ensemble learning and
-> adversarial training. EAT promotes the diversity of the ensemble, encouraging
-> each model in the ensemble to have different robustness features, and employs a
-> self-transferring mechanism to make full use of the robustness statistics of
-> the whole ensemble in the update of every single model. We evaluate the EAT
-> method on three widely-used datasets with two popular model architectures. The
-> results show that the proposed EAT method greatly outperforms the adaptions of
-> defenses designed for deep classification models.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs) but instead addresses Deep Metric Learning (DML) and its robustness against adversarial attacks, with no mention of LLMs, text generation, or prompt manipulation.
-
----
-
-## [Reasoning based on symbolic and parametric knowledge bases: a survey](https://arxiv.org/abs/http://arxiv.org/abs/2501.01030v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01030v1
-
-**Abstract:**
-> Reasoning is fundamental to human intelligence, and critical for
-> problem-solving, decision-making, and critical thinking. Reasoning refers to
-> drawing new conclusions based on existing knowledge, which can support various
-> applications like clinical diagnosis, basic education, and financial analysis.
-> Though a good number of surveys have been proposed for reviewing
-> reasoning-related methods, none of them has systematically investigated these
-> methods from the viewpoint of their dependent knowledge base. Both the
-> scenarios to which the knowledge bases are applied and their storage formats
-> are significantly different. Hence, investigating reasoning methods from the
-> knowledge base perspective helps us better understand the challenges and future
-> directions. To fill this gap, this paper first classifies the knowledge base
-> into symbolic and parametric ones. The former explicitly stores information in
-> human-readable symbols, and the latter implicitly encodes knowledge within
-> parameters. Then, we provide a comprehensive overview of reasoning methods
-> using symbolic knowledge bases, parametric knowledge bases, and both of them.
-> Finally, we identify the future direction toward enhancing reasoning
-> capabilities to bridge the gap between human and machine intelligence.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus primarily on the engineering, design, or optimization of prompts for Large Language Models (LLMs), but rather on a survey of reasoning methods based on symbolic and parametric knowledge bases, without explicit mention of prompt engineering or LLMs.
-
----
-
-## [MSWA: Refining Local Attention with Multi-ScaleWindow Attention](https://arxiv.org/abs/http://arxiv.org/abs/2501.01039v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01039v1
-
-**Abstract:**
-> Transformer-based LLMs have achieved exceptional performance across a wide
-> range of NLP tasks. However, the standard self-attention mechanism suffers from
-> quadratic time complexity and linearly increased cache size. Sliding window
-> attention (SWA) solves this problem by restricting the attention range to a
-> fixed-size local context window. Nevertheless, SWA employs a uniform window
-> size for each head in each layer, making it inefficient in capturing context of
-> varying scales. To mitigate this limitation, we propose Multi-Scale Window
-> Attention (MSWA) which applies diverse window sizes across heads and layers in
-> the Transformer. It not only allows for different window sizes among heads
-> within the same layer but also progressively increases window size allocation
-> from shallow to deep layers, thus enabling the model to capture contextual
-> information with different lengths and distances. Experimental results on
-> language modeling and common-sense reasoning tasks substantiate that MSWA
-> outperforms traditional local attention in both effectiveness and efficiency.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on refining the local attention mechanism in Transformer-based LLMs through Multi-Scale Window Attention, which pertains to the development of new LLM architecture/training methods, not prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Risks of Cultural Erasure in Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01056v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01056v1
-
-**Abstract:**
-> Large language models are increasingly being integrated into applications
-> that shape the production and discovery of societal knowledge such as search,
-> online education, and travel planning. As a result, language models will shape
-> how people learn about, perceive and interact with global cultures making it
-> important to consider whose knowledge systems and perspectives are represented
-> in models. Recognizing this importance, increasingly work in Machine Learning
-> and NLP has focused on evaluating gaps in global cultural representational
-> distribution within outputs. However, more work is needed on developing
-> benchmarks for cross-cultural impacts of language models that stem from a
-> nuanced sociologically-aware conceptualization of cultural impact or harm. We
-> join this line of work arguing for the need of metricizable evaluations of
-> language technologies that interrogate and account for historical power
-> inequities and differential impacts of representation on global cultures,
-> particularly for cultures already under-represented in the digital corpora. We
-> look at two concepts of erasure: omission: where cultures are not represented
-> at all and simplification i.e. when cultural complexity is erased by presenting
-> one-dimensional views of a rich culture. The former focuses on whether
-> something is represented, and the latter on how it is represented. We focus our
-> analysis on two task contexts with the potential to influence global cultural
-> production. First, we probe representations that a language model produces
-> about different places around the world when asked to describe these contexts.
-> Second, we analyze the cultures represented in the travel recommendations
-> produced by a set of language model applications. Our study shows ways in which
-> the NLP community and application developers can begin to operationalize
-> complex socio-cultural considerations into standard evaluations and benchmarks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on evaluating cultural representation and potential erasure in Large Language Models, but does not primarily investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts, nor does it provide concrete examples of prompts and their impact on LLM output.
-
----
-
-## [Graph Generative Pre-trained Transformer](https://arxiv.org/abs/http://arxiv.org/abs/2501.01073v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01073v1
-
-**Abstract:**
-> Graph generation is a critical task in numerous domains, including molecular
-> design and social network analysis, due to its ability to model complex
-> relationships and structured data. While most modern graph generative models
-> utilize adjacency matrix representations, this work revisits an alternative
-> approach that represents graphs as sequences of node set and edge set. We
-> advocate for this approach due to its efficient encoding of graphs and propose
-> a novel representation. Based on this representation, we introduce the Graph
-> Generative Pre-trained Transformer (G2PT), an auto-regressive model that learns
-> graph structures via next-token prediction. To further exploit G2PT's
-> capabilities as a general-purpose foundation model, we explore fine-tuning
-> strategies for two downstream applications: goal-oriented generation and graph
-> property prediction. We conduct extensive experiments across multiple datasets.
-> Results indicate that G2PT achieves superior generative performance on both
-> generic graph and molecule datasets. Furthermore, G2PT exhibits strong
-> adaptability and versatility in downstream tasks from molecular design to
-> property prediction.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on the development of a new graph generative model (Graph Generative Pre-trained Transformer) and its fine-tuning strategies for downstream applications, rather than primarily on the engineering, design, or optimization of prompts for Large Language Models (LLMs).
-
----
-
-## [BatStyler: Advancing Multi-category Style Generation for Source-free
-  Domain Generalization](https://arxiv.org/abs/http://arxiv.org/abs/2501.01109v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01109v1
-
-**Abstract:**
-> Source-Free Domain Generalization (SFDG) aims to develop a model that
-> performs on unseen domains without relying on any source domains. However, the
-> implementation remains constrained due to the unavailability of training data.
-> Research on SFDG focus on knowledge transfer of multi-modal models and style
-> synthesis based on joint space of multiple modalities, thus eliminating the
-> dependency on source domain images. However, existing works primarily work for
-> multi-domain and less-category configuration, but performance on multi-domain
-> and multi-category configuration is relatively poor. In addition, the
-> efficiency of style synthesis also deteriorates in multi-category scenarios.
-> How to efficiently synthesize sufficiently diverse data and apply it to
-> multi-category configuration is a direction with greater practical value. In
-> this paper, we propose a method called BatStyler, which is utilized to improve
-> the capability of style synthesis in multi-category scenarios. BatStyler
-> consists of two modules: Coarse Semantic Generation and Uniform Style
-> Generation modules. The Coarse Semantic Generation module extracts
-> coarse-grained semantics to prevent the compression of space for style
-> diversity learning in multi-category configuration, while the Uniform Style
-> Generation module provides a template of styles that are uniformly distributed
-> in space and implements parallel training. Extensive experiments demonstrate
-> that our method exhibits comparable performance on less-category datasets,
-> while surpassing state-of-the-art methods on multi-category datasets.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses on Source-Free Domain Generalization for multi-category style generation, not on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not provide concrete examples of prompts impacting LLM output.
-
----
-
-## [MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic
-  Forgetting in Malware Classification](https://arxiv.org/abs/http://arxiv.org/abs/2501.01110v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01110v1
-
-**Abstract:**
-> Continual Learning (CL) for malware classification tackles the rapidly
-> evolving nature of malware threats and the frequent emergence of new types.
-> Generative Replay (GR)-based CL systems utilize a generative model to produce
-> synthetic versions of past data, which are then combined with new data to
-> retrain the primary model. Traditional machine learning techniques in this
-> domain often struggle with catastrophic forgetting, where a model's performance
-> on old data degrades over time.
-> In this paper, we introduce a GR-based CL system that employs Generative
-> Adversarial Networks (GANs) with feature matching loss to generate high-quality
-> malware samples. Additionally, we implement innovative selection schemes for
-> replay samples based on the model's hidden representations.
-> Our comprehensive evaluation across Windows and Android malware datasets in a
-> class-incremental learning scenario -- where new classes are introduced
-> continuously over multiple tasks -- demonstrates substantial performance
-> improvements over previous methods. For example, our system achieves an average
-> accuracy of 55% on Windows malware samples, significantly outperforming other
-> GR-based models by 28%. This study provides practical insights for advancing
-> GR-based malware classification systems. The implementation is available at
-> \url {https://github.com/MalwareReplayGAN/MalCL}\footnote{The code will be made
-> public upon the presentation of the paper}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on leveraging GAN-based Generative Replay for Continual Learning in malware classification, which does not meet the 'MUST' criteria of primarily focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts for improving LLM performance.
-
----
-
-## [Pruning-based Data Selection and Network Fusion for Efficient Deep
-  Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01118v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01118v1
-
-**Abstract:**
-> Efficient data selection is essential for improving the training efficiency
-> of deep neural networks and reducing the associated annotation costs. However,
-> traditional methods tend to be computationally expensive, limiting their
-> scalability and real-world applicability. We introduce PruneFuse, a novel
-> method that combines pruning and network fusion to enhance data selection and
-> accelerate network training. In PruneFuse, the original dense network is pruned
-> to generate a smaller surrogate model that efficiently selects the most
-> informative samples from the dataset. Once this iterative data selection
-> selects sufficient samples, the insights learned from the pruned model are
-> seamlessly integrated with the dense model through network fusion, providing an
-> optimized initialization that accelerates training. Extensive experimentation
-> on various datasets demonstrates that PruneFuse significantly reduces
-> computational costs for data selection, achieves better performance than
-> baselines, and accelerates the overall training process.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on improving deep neural network training efficiency through pruning and network fusion, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal
-  Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01120v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01120v1
-
-**Abstract:**
-> Multimodal learning with incomplete modality is practical and challenging.
-> Recently, researchers have focused on enhancing the robustness of pre-trained
-> MultiModal Transformers (MMTs) under missing modality conditions by applying
-> learnable prompts. However, these prompt-based methods face several
-> limitations: (1) incomplete modalities provide restricted modal cues for
-> task-specific inference, (2) dummy imputation for missing content causes
-> information loss and introduces noise, and (3) static prompts are
-> instance-agnostic, offering limited knowledge for instances with various
-> missing conditions. To address these issues, we propose RAGPT, a novel
-> Retrieval-AuGmented dynamic Prompt Tuning framework. RAGPT comprises three
-> modules: (I) the multi-channel retriever, which identifies similar instances
-> through a within-modality retrieval strategy, (II) the missing modality
-> generator, which recovers missing information using retrieved contexts, and
-> (III) the context-aware prompter, which captures contextual knowledge from
-> relevant instances and generates dynamic prompts to largely enhance the MMT's
-> robustness. Extensive experiments conducted on three real-world datasets show
-> that RAGPT consistently outperforms all competitive baselines in handling
-> incomplete modality problems. The code of our work and prompt-based baselines
-> is available at https://github.com/Jian-Lang/RAGPT.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing the robustness of MultiModal Transformers (MMTs) for incomplete multimodal learning, with dynamic prompt tuning being a component of the proposed framework, rather than the core subject. The main focus is on addressing multimodal learning challenges, not specifically on prompt engineering for text-based interactions with Large Language Models (LLMs).
-
----
-
-## [Deep Learning in Palmprint Recognition-A Comprehensive Survey](https://arxiv.org/abs/http://arxiv.org/abs/2501.01166v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01166v1
-
-**Abstract:**
-> Palmprint recognition has emerged as a prominent biometric technology, widely
-> applied in diverse scenarios. Traditional handcrafted methods for palmprint
-> recognition often fall short in representation capability, as they heavily
-> depend on researchers' prior knowledge. Deep learning (DL) has been introduced
-> to address this limitation, leveraging its remarkable successes across various
-> domains. While existing surveys focus narrowly on specific tasks within
-> palmprint recognition-often grounded in traditional methodologies-there remains
-> a significant gap in comprehensive research exploring DL-based approaches
-> across all facets of palmprint recognition. This paper bridges that gap by
-> thoroughly reviewing recent advancements in DL-powered palmprint recognition.
-> The paper systematically examines progress across key tasks, including
-> region-of-interest segmentation, feature extraction, and
-> security/privacy-oriented challenges. Beyond highlighting these advancements,
-> the paper identifies current challenges and uncovers promising opportunities
-> for future research. By consolidating state-of-the-art progress, this review
-> serves as a valuable resource for researchers, enabling them to stay abreast of
-> cutting-edge technologies and drive innovation in palmprint recognition.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on deep learning in palmprint recognition, a biometric technology unrelated to Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes
-  in Benchmark Datasets](https://arxiv.org/abs/http://arxiv.org/abs/2501.01168v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01168v1
-
-**Abstract:**
-> The multifaceted challenge of accurately measuring gender stereotypical bias
-> in language models is akin to discerning different segments of a broader,
-> unseen entity. This short paper primarily focuses on intrinsic bias mitigation
-> and measurement strategies for language models, building on prior research that
-> demonstrates a lack of correlation between intrinsic and extrinsic approaches.
-> We delve deeper into intrinsic measurements, identifying inconsistencies and
-> suggesting that these benchmarks may reflect different facets of gender
-> stereotype. Our methodology involves analyzing data distributions across
-> datasets and integrating gender stereotype components informed by social
-> psychology. By adjusting the distribution of two datasets, we achieve a better
-> alignment of outcomes. Our findings underscore the complexity of gender
-> stereotyping in language models and point to new directions for developing more
-> refined techniques to detect and reduce bias.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on measuring and mitigating gender stereotypical bias in language models, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not provide concrete examples of prompts and their impact on LLM output.
-
----
-
-## [L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild](https://arxiv.org/abs/http://arxiv.org/abs/2501.01174v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01174v1
-
-**Abstract:**
-> While 2D pose estimation has advanced our ability to interpret body movements
-> in animals and primates, it is limited by the lack of depth information,
-> constraining its application range. 3D pose estimation provides a more
-> comprehensive solution by incorporating spatial depth, yet creating extensive
-> 3D pose datasets for animals is challenging due to their dynamic and
-> unpredictable behaviours in natural settings. To address this, we propose a
-> hybrid approach that utilizes rigged avatars and the pipeline to generate
-> synthetic datasets to acquire the necessary 3D annotations for training. Our
-> method introduces a simple attention-based MLP network for converting 2D poses
-> to 3D, designed to be independent of the input image to ensure scalability for
-> poses in natural environments. Additionally, we identify that existing
-> anatomical keypoint detectors are insufficient for accurate pose retargeting
-> onto arbitrary avatars. To overcome this, we present a lookup table based on a
-> deep pose estimation method using a synthetic collection of diverse actions
-> rigged avatars perform. Our experiments demonstrate the effectiveness and
-> efficiency of this lookup table-based retargeting approach. Overall, we propose
-> a comprehensive framework with systematically synthesized datasets for lifting
-> poses from 2D to 3D and then utilize this to re-target motion from wild
-> settings onto arbitrary avatars.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on 3D pose estimation for avatars using computer vision and machine learning techniques, with no primary emphasis on the engineering, design, or optimization of prompts for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Data Augmentation Techniques for Chinese Disease Name Normalization](https://arxiv.org/abs/http://arxiv.org/abs/2501.01195v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01195v1
-
-**Abstract:**
-> Disease name normalization is an important task in the medical domain. It
-> classifies disease names written in various formats into standardized names,
-> serving as a fundamental component in smart healthcare systems for various
-> disease-related functions. Nevertheless, the most significant obstacle to
-> existing disease name normalization systems is the severe shortage of training
-> data. Consequently, we present a novel data augmentation approach that includes
-> a series of data augmentation techniques and some supporting modules to help
-> mitigate the problem. Through extensive experimentation, we illustrate that our
-> proposed approach exhibits significant performance improvements across various
-> baseline models and training objectives, particularly in scenarios with limited
-> training data
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on medical domain (disease name normalization) and data augmentation techniques for training models, rather than prompt engineering for Large Language Models (LLMs) and manipulating textual input prompts to improve LLM performance.
-
----
-
-## [A redescription mining framework for post-hoc explaining and relating
-  deep learning models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01209v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01209v1
-
-**Abstract:**
-> Deep learning models (DLMs) achieve increasingly high performance both on
-> structured and unstructured data. They significantly extended applicability of
-> machine learning to various domains. Their success in making predictions,
-> detecting patterns and generating new data made significant impact on science
-> and industry. Despite these accomplishments, DLMs are difficult to explain
-> because of their enormous size. In this work, we propose a novel framework for
-> post-hoc explaining and relating DLMs using redescriptions. The framework
-> allows cohort analysis of arbitrary DLMs by identifying statistically
-> significant redescriptions of neuron activations. It allows coupling neurons to
-> a set of target labels or sets of descriptive attributes, relating layers
-> within a single DLM or associating different DLMs. The proposed framework is
-> independent of the artificial neural network architecture and can work with
-> more complex target labels (e.g. multi-label or multi-target scenario).
-> Additionally, it can emulate both pedagogical and decompositional approach to
-> rule extraction. The aforementioned properties of the proposed framework can
-> increase explainability and interpretability of arbitrary DLMs by providing
-> different information compared to existing explainable-AI approaches.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on explaining and relating deep learning models (DLMs) in general, using redescriptions, without a specific emphasis on Large Language Models (LLMs), prompt engineering, or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [An Efficient Attention Mechanism for Sequential Recommendation Tasks:
-  HydraRec](https://arxiv.org/abs/http://arxiv.org/abs/2501.01242v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01242v1
-
-**Abstract:**
-> Transformer based models are increasingly being used in various domains
-> including recommender systems (RS). Pretrained transformer models such as BERT
-> have shown good performance at language modelling. With the greater ability to
-> model sequential tasks, variants of Encoder-only models (like BERT4Rec, SASRec
-> etc.) have found success in sequential RS problems. Computing dot-product
-> attention in traditional transformer models has quadratic complexity in
-> sequence length. This is a bigger problem with RS because unlike language
-> models, new items are added to the catalogue every day. User buying history is
-> a dynamic sequence which depends on multiple factors. Recently, various linear
-> attention models have tried to solve this problem by making the model linear in
-> sequence length (token dimensions). Hydra attention is one such linear
-> complexity model proposed for vision transformers which reduces the complexity
-> of attention for both the number of tokens as well as model embedding
-> dimensions. Building on the idea of Hydra attention, we introduce an efficient
-> Transformer based Sequential RS (HydraRec) which significantly improves
-> theoretical complexity of computing attention for longer sequences and bigger
-> datasets while preserving the temporal context. Extensive experiments are
-> conducted to evaluate other linear transformer-based RS models and compared
-> with HydraRec across various evaluation metrics. HydraRec outperforms other
-> linear attention-based models as well as dot-product based attention models
-> when used with causal masking for sequential recommendation next item
-> prediction tasks. For bi-directional models its performance is comparable to
-> the BERT4Rec model with an improvement in running time.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing an efficient attention mechanism for sequential recommendation tasks using transformer-based models, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs). It does not investigate methods for improving LLM performance through the manipulation of textual input prompts, nor does it provide concrete examples of prompts and their impact on LLM output.
-
----
-
-## [Stealthy Backdoor Attack to Real-world Models in Android Apps](https://arxiv.org/abs/http://arxiv.org/abs/2501.01263v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01263v1
-
-**Abstract:**
-> Powered by their superior performance, deep neural networks (DNNs) have found
-> widespread applications across various domains. Many deep learning (DL) models
-> are now embedded in mobile apps, making them more accessible to end users
-> through on-device DL. However, deploying on-device DL to users' smartphones
-> simultaneously introduces several security threats. One primary threat is
-> backdoor attacks. Extensive research has explored backdoor attacks for several
-> years and has proposed numerous attack approaches. However, few studies have
-> investigated backdoor attacks on DL models deployed in the real world, or they
-> have shown obvious deficiencies in effectiveness and stealthiness. In this
-> work, we explore more effective and stealthy backdoor attacks on real-world DL
-> models extracted from mobile apps. Our main justification is that imperceptible
-> and sample-specific backdoor triggers generated by DNN-based steganography can
-> enhance the efficacy of backdoor attacks on real-world models. We first confirm
-> the effectiveness of steganography-based backdoor attacks on four
-> state-of-the-art DNN models. Subsequently, we systematically evaluate and
-> analyze the stealthiness of the attacks to ensure they are difficult to
-> perceive. Finally, we implement the backdoor attacks on real-world models and
-> compare our approach with three baseline methods. We collect 38,387 mobile
-> apps, extract 89 DL models from them, and analyze these models to obtain the
-> prerequisite model information for the attacks. After identifying the target
-> models, our approach achieves an average of 12.50% higher attack success rate
-> than DeepPayload while better maintaining the normal performance of the models.
-> Extensive experimental results demonstrate that our method enables more
-> effective, robust, and stealthy backdoor attacks on real-world models.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on backdoor attacks in deep neural networks deployed in Android apps, not on the engineering, design, or optimization of prompts for Large Language Models (LLMs), failing to meet all the primary 'MUST' criteria.
-
----
-
-## [PIMAEX: Multi-Agent Exploration through Peer Incentivization](https://arxiv.org/abs/http://arxiv.org/abs/2501.01266v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01266v1
-
-**Abstract:**
-> While exploration in single-agent reinforcement learning has been studied
-> extensively in recent years, considerably less work has focused on its
-> counterpart in multi-agent reinforcement learning. To address this issue, this
-> work proposes a peer-incentivized reward function inspired by previous research
-> on intrinsic curiosity and influence-based rewards. The \textit{PIMAEX} reward,
-> short for Peer-Incentivized Multi-Agent Exploration, aims to improve
-> exploration in the multi-agent setting by encouraging agents to exert influence
-> over each other to increase the likelihood of encountering novel states. We
-> evaluate the \textit{PIMAEX} reward in conjunction with
-> \textit{PIMAEX-Communication}, a multi-agent training algorithm that employs a
-> communication channel for agents to influence one another. The evaluation is
-> conducted in the \textit{Consume/Explore} environment, a partially observable
-> environment with deceptive rewards, specifically designed to challenge the
-> exploration vs.\ exploitation dilemma and the credit-assignment problem. The
-> results empirically demonstrate that agents using the \textit{PIMAEX} reward
-> with \textit{PIMAEX-Communication} outperform those that do not.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multi-agent reinforcement learning, proposing a reward function and training algorithm for improved exploration, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), thus failing to meet the core subject requirement.
-
----
-
-## [NeutraSum: A Language Model can help a Balanced Media Diet by
-  Neutralizing News Summaries](https://arxiv.org/abs/http://arxiv.org/abs/2501.01284v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01284v1
-
-**Abstract:**
-> Media bias in news articles arises from the political polarisation of media
-> outlets, which can reinforce societal stereotypes and beliefs. Reporting on the
-> same event often varies significantly between outlets, reflecting their
-> political leanings through polarised language and focus. Although previous
-> studies have attempted to generate bias-free summaries from multiperspective
-> news articles, they have not effectively addressed the challenge of mitigating
-> inherent media bias. To address this gap, we propose \textbf{NeutraSum}, a
-> novel framework that integrates two neutrality losses to adjust the semantic
-> space of generated summaries, thus minimising media bias. These losses,
-> designed to balance the semantic distances across polarised inputs and ensure
-> alignment with expert-written summaries, guide the generation of neutral and
-> factually rich summaries. To evaluate media bias, we employ the political
-> compass test, which maps political leanings based on economic and social
-> dimensions. Experimental results on the Allsides dataset demonstrate that
-> NeutraSum not only improves summarisation performance but also achieves
-> significant reductions in media bias, offering a promising approach for neutral
-> news summarisation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a framework (NeutraSum) to mitigate media bias in news summaries, rather than engineer or optimize prompts specifically for Large Language Models (LLMs). While it involves generated summaries, the core subject is not prompt engineering for text-based interactions with LLMs, but rather the reduction of media bias.
-
----
-
-## [Citations and Trust in LLM Generated Responses](https://arxiv.org/abs/http://arxiv.org/abs/2501.01303v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01303v1
-
-**Abstract:**
-> Question answering systems are rapidly advancing, but their opaque nature may
-> impact user trust. We explored trust through an anti-monitoring framework,
-> where trust is predicted to be correlated with presence of citations and
-> inversely related to checking citations. We tested this hypothesis with a live
-> question-answering experiment that presented text responses generated using a
-> commercial Chatbot along with varying citations (zero, one, or five), both
-> relevant and random, and recorded if participants checked the citations and
-> their self-reported trust in the generated responses. We found a significant
-> increase in trust when citations were present, a result that held true even
-> when the citations were random; we also found a significant decrease in trust
-> when participants checked the citations. These results highlight the importance
-> of citations in enhancing trust in AI-generated content.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on enhancing user trust in AI-generated content through citations, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not demonstrate the impact of textual input prompts on LLM output."
-}
-
----
-
-## [Multi-Head Explainer: A General Framework to Improve Explainability in
-  CNNs and Transformers](https://arxiv.org/abs/http://arxiv.org/abs/2501.01311v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01311v2
-
-**Abstract:**
-> In this study, we introduce the Multi-Head Explainer (MHEX), a versatile and
-> modular framework that enhances both the explainability and accuracy of
-> Convolutional Neural Networks (CNNs) and Transformer-based models. MHEX
-> consists of three core components: an Attention Gate that dynamically
-> highlights task-relevant features, Deep Supervision that guides early layers to
-> capture fine-grained details pertinent to the target class, and an Equivalent
-> Matrix that unifies refined local and global representations to generate
-> comprehensive saliency maps. Our approach demonstrates superior compatibility,
-> enabling effortless integration into existing residual networks like ResNet and
-> Transformer architectures such as BERT with minimal modifications. Extensive
-> experiments on benchmark datasets in medical imaging and text classification
-> show that MHEX not only improves classification accuracy but also produces
-> highly interpretable and detailed saliency scores.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a framework to enhance explainability and accuracy in CNNs and Transformers, rather than engineering prompts for Large Language Models (LLMs). Prompt engineering for LLMs is not the central concern, and the paper falls under excluded categories (development of new model components and medical subjects).
-
----
-
-## [Understanding Difficult-to-learn Examples in Contrastive Learning: A
-  Theoretical Framework for Spectral Contrastive Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01317v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01317v1
-
-**Abstract:**
-> Unsupervised contrastive learning has shown significant performance
-> improvements in recent years, often approaching or even rivaling supervised
-> learning in various tasks. However, its learning mechanism is fundamentally
-> different from that of supervised learning. Previous works have shown that
-> difficult-to-learn examples (well-recognized in supervised learning as examples
-> around the decision boundary), which are essential in supervised learning,
-> contribute minimally in unsupervised settings. In this paper, perhaps
-> surprisingly, we find that the direct removal of difficult-to-learn examples,
-> although reduces the sample size, can boost the downstream classification
-> performance of contrastive learning. To uncover the reasons behind this, we
-> develop a theoretical framework modeling the similarity between different pairs
-> of samples. Guided by this theoretical framework, we conduct a thorough
-> theoretical analysis revealing that the presence of difficult-to-learn examples
-> negatively affects the generalization of contrastive learning. Furthermore, we
-> demonstrate that the removal of these examples, and techniques such as margin
-> tuning and temperature scaling can enhance its generalization bounds, thereby
-> improving performance. Empirically, we propose a simple and efficient mechanism
-> for selecting difficult-to-learn examples and validate the effectiveness of the
-> aforementioned methods, which substantiates the reliability of our proposed
-> theoretical framework.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on unsupervised contrastive learning, its theoretical framework, and improving downstream classification performance, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [DeepFilter: An Instrumental Baseline for Accurate and Efficient Process
-  Monitoring](https://arxiv.org/abs/http://arxiv.org/abs/2501.01342v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01342v1
-
-**Abstract:**
-> Effective process monitoring is increasingly vital in industrial automation
-> for ensuring operational safety, necessitating both high accuracy and
-> efficiency. Although Transformers have demonstrated success in various fields,
-> their canonical form based on the self-attention mechanism is inadequate for
-> process monitoring due to two primary limitations: (1) the step-wise
-> correlations captured by self-attention mechanism are difficult to capture
-> discriminative patterns in monitoring logs due to the lacking semantics of each
-> step, thus compromising accuracy; (2) the quadratic computational complexity of
-> self-attention hampers efficiency. To address these issues, we propose
-> DeepFilter, a Transformer-style framework for process monitoring. The core
-> innovation is an efficient filtering layer that excel capturing long-term and
-> periodic patterns with reduced complexity. Equipping with the global filtering
-> layer, DeepFilter enhances both accuracy and efficiency, meeting the stringent
-> demands of process monitoring. Experimental results on real-world process
-> monitoring datasets validate DeepFilter's superiority in terms of accuracy and
-> efficiency compared to existing state-of-the-art models.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on developing a new Transformer-style framework (DeepFilter) for process monitoring in industrial automation, addressing limitations of self-attention mechanisms, which falls under 'development of new LLM architectures or training methods' and does not centrally concern prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-## [A Unified Hyperparameter Optimization Pipeline for Transformer-Based
-  Time Series Forecasting Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01394v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01394v1
-
-**Abstract:**
-> Transformer-based models for time series forecasting (TSF) have attracted
-> significant attention in recent years due to their effectiveness and
-> versatility. However, these models often require extensive hyperparameter
-> optimization (HPO) to achieve the best possible performance, and a unified
-> pipeline for HPO in transformer-based TSF remains lacking. In this paper, we
-> present one such pipeline and conduct extensive experiments on several
-> state-of-the-art (SOTA) transformer-based TSF models. These experiments are
-> conducted on standard benchmark datasets to evaluate and compare the
-> performance of different models, generating practical insights and examples.
-> Our pipeline is generalizable beyond transformer-based architectures and can be
-> applied to other SOTA models, such as Mamba and TimeMixer, as demonstrated in
-> our experiments. The goal of this work is to provide valuable guidance to both
-> industry practitioners and academic researchers in efficiently identifying
-> optimal hyperparameters suited to their specific domain applications. The code
-> and complete experimental results are available on GitHub.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on hyperparameter optimization for transformer-based time series forecasting models, rather than on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not provide examples of prompts or their impact on LLM output.
-
----
-
-## [On Unifying Video Generation and Camera Pose Estimation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01409v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01409v1
-
-**Abstract:**
-> Inspired by the emergent 3D capabilities in image generators, we explore
-> whether video generators similarly exhibit 3D awareness. Using
-> structure-from-motion (SfM) as a benchmark for 3D tasks, we investigate if
-> intermediate features from OpenSora, a video generation model, can support
-> camera pose estimation. We first examine native 3D awareness in video
-> generation features by routing raw intermediate outputs to SfM-prediction
-> modules like DUSt3R. Then, we explore the impact of fine-tuning on camera pose
-> estimation to enhance 3D awareness. Results indicate that while video generator
-> features have limited inherent 3D awareness, task-specific supervision
-> significantly boosts their accuracy for camera pose estimation, resulting in
-> competitive performance. The proposed unified model, named JOG3R, produces
-> camera pose estimates with competitive quality without degrading video
-> generation quality.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on video generation and camera pose estimation, which falls under image/video generation, not text generation driven by Large Language Models (LLMs), thereby violating the 'Papers MUST NOT' criteria #2.
-
----
-
-## [Balance-aware Sequence Sampling Makes Multi-modal Learning Better](https://arxiv.org/abs/http://arxiv.org/abs/2501.01470v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01470v1
-
-**Abstract:**
-> To address the modality imbalance caused by data heterogeneity, existing
-> multi-modal learning (MML) approaches primarily focus on balancing this
-> difference from the perspective of optimization objectives. However, almost all
-> existing methods ignore the impact of sample sequences, i.e., an inappropriate
-> training order tends to trigger learning bias in the model, further
-> exacerbating modality imbalance. In this paper, we propose Balance-aware
-> Sequence Sampling (BSS) to enhance the robustness of MML. Specifically, we
-> first define a multi-perspective measurer to evaluate the balance degree of
-> each sample. Via the evaluation, we employ a heuristic scheduler based on
-> curriculum learning (CL) that incrementally provides training subsets,
-> progressing from balanced to imbalanced samples to rebalance MML. Moreover,
-> considering that sample balance may evolve as the model capability increases,
-> we propose a learning-based probabilistic sampling method to dynamically update
-> the training sequences at the epoch level, further improving MML performance.
-> Extensive experiments on widely used datasets demonstrate the superiority of
-> our method compared with state-of-the-art (SOTA) MML approaches.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing multi-modal learning (MML) through balance-aware sequence sampling, ignoring the core subject of prompt engineering for text-based interactions with Large Language Models (LLMs), and instead addressing modality imbalance in MML without mentioning prompts or LLM-specific text generation.
-
----
-
-## [Augmented Contrastive Clustering with Uncertainty-Aware Prototyping for
-  Time Series Test Time Adaptation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01472v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01472v1
-
-**Abstract:**
-> Test-time adaptation aims to adapt pre-trained deep neural networks using
-> solely online unlabelled test data during inference. Although TTA has shown
-> promise in visual applications, its potential in time series contexts remains
-> largely unexplored. Existing TTA methods, originally designed for visual tasks,
-> may not effectively handle the complex temporal dynamics of real-world time
-> series data, resulting in suboptimal adaptation performance. To address this
-> gap, we propose Augmented Contrastive Clustering with Uncertainty-aware
-> Prototyping (ACCUP), a straightforward yet effective TTA method for time series
-> data. Initially, our approach employs augmentation ensemble on the time series
-> data to capture diverse temporal information and variations, incorporating
-> uncertainty-aware prototypes to distill essential characteristics.
-> Additionally, we introduce an entropy comparison scheme to selectively acquire
-> more confident predictions, enhancing the reliability of pseudo labels.
-> Furthermore, we utilize augmented contrastive clustering to enhance feature
-> discriminability and mitigate error accumulation from noisy pseudo labels,
-> promoting cohesive clustering within the same class while facilitating clear
-> separation between different classes. Extensive experiments conducted on three
-> real-world time series datasets and an additional visual dataset demonstrate
-> the effectiveness and generalization potential of the proposed method,
-> advancing the underexplored realm of TTA for time series data.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses on test-time adaptation for time series data using deep neural networks, without any mention of Large Language Models (LLMs), prompt engineering, or manipulation of textual input prompts for improving LLM performance.
-
----
-
-## [Unraveling Indirect In-Context Learning Using Influence Functions](https://arxiv.org/abs/http://arxiv.org/abs/2501.01473v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01473v1
-
-**Abstract:**
-> This work introduces a novel paradigm for generalized In-Context Learning
-> (ICL), termed Indirect In-Context Learning. In Indirect ICL, we explore
-> demonstration selection strategies tailored for two distinct real-world
-> scenarios: Mixture of Tasks and Noisy Demonstrations. We systematically
-> evaluate the effectiveness of Influence Functions (IFs) as a selection tool for
-> these settings, highlighting the potential for IFs to better capture the
-> informativeness of examples within the demonstration pool. For the Mixture of
-> Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU,
-> BigBench, StrategyQA, and CommonsenseQA. We demonstrate that combining
-> BertScore-Recall (BSR) with an IF surrogate model can significantly improve
-> performance, leading to average absolute accuracy gains of 0.37\% and 1.45\%
-> for 3-shot and 5-shot setups when compared to traditional ICL metrics. In the
-> Noisy Demonstrations setting, we examine scenarios where demonstrations might
-> be mislabeled. Our experiments show that reweighting traditional ICL selectors
-> (BSR and Cosine Similarity) with IF-based selectors boosts accuracy by an
-> average of 2.90\% for Cosine Similarity and 2.94\% for BSR on noisy GLUE
-> benchmarks. In sum, we propose a robust framework for demonstration selection
-> that generalizes beyond traditional ICL, offering valuable insights into the
-> role of IFs for Indirect ICL.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on demonstration selection strategies using Influence Functions for In-Context Learning, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs), failing to meet the core subject requirement.
-
----
-
-## [A Survey of Deep Learning Methods in Protein Bioinformatics and its
-  Impact on Protein Design](https://arxiv.org/abs/http://arxiv.org/abs/2501.01477v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01477v1
-
-**Abstract:**
-> Proteins are sequences of amino acids that serve as the basic building blocks
-> of living organisms. Despite rapidly growing databases documenting structural
-> and functional information for various protein sequences, our understanding of
-> proteins remains limited because of the large possible sequence space and the
-> complex inter- and intra-molecular forces. Deep learning, which is
-> characterized by its ability to learn relevant features directly from large
-> datasets, has demonstrated remarkable performance in fields such as computer
-> vision and natural language processing. It has also been increasingly applied
-> in recent years to the data-rich domain of protein sequences with great
-> success, most notably with Alphafold2's breakout performance in the protein
-> structure prediction. The performance improvements achieved by deep learning
-> unlocks new possibilities in the field of protein bioinformatics, including
-> protein design, one of the most difficult but useful tasks. In this paper, we
-> broadly categorize problems in protein bioinformatics into three main
-> categories: 1) structural prediction, 2) functional prediction, and 3) protein
-> design, and review the progress achieved from using deep learning methodologies
-> in each of them. We expand on the main challenges of the protein design problem
-> and highlight how advances in structural and functional prediction have
-> directly contributed to design tasks. Finally, we conclude by identifying
-> important topics and future research directions.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on deep learning methods in protein bioinformatics and protein design, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Drift2Matrix: Kernel-Induced Self Representation for Concept Drift
-  Adaptation in Co-evolving Time Series](https://arxiv.org/abs/http://arxiv.org/abs/2501.01480v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01480v2
-
-**Abstract:**
-> In the realm of time series analysis, tackling the phenomenon of concept
-> drift poses a significant challenge. Concept drift -- characterized by the
-> evolving statistical properties of time series data, affects the reliability
-> and accuracy of conventional analysis models. This is particularly evident in
-> co-evolving scenarios where interactions among variables are crucial. This
-> paper presents Drift2Matrix, a novel framework that leverages kernel-induced
-> self-representation for adaptive responses to concept drift in time series.
-> Drift2Matrix employs a kernel-based learning mechanism to generate a
-> representation matrix, encapsulating the inherent dynamics of co-evolving time
-> series. This matrix serves as a key tool for identification and adaptation to
-> concept drift by observing its temporal variations. Furthermore, Drift2Matrix
-> effectively identifies prevailing patterns and offers insights into emerging
-> trends through pattern evolution analysis. Our empirical evaluation of
-> Drift2Matrix across various datasets demonstrates its effectiveness in handling
-> the complexities of concept drift. This approach introduces a novel perspective
-> in the theoretical domain of co-evolving time series analysis, enhancing
-> adaptability and accuracy in the face of dynamic data environments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs). Instead, it presents a framework (Drift2Matrix) for adapting to concept drift in co-evolving time series analysis, which is unrelated to the engineering, design, or optimization of prompts for LLMs.
-
----
-
-## [BoxingGym: Benchmarking Progress in Automated Experimental Design and
-  Model Discovery](https://arxiv.org/abs/http://arxiv.org/abs/2501.01540v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01540v1
-
-**Abstract:**
-> Understanding the world and explaining it with scientific theories is a
-> central aspiration of artificial intelligence research. Proposing theories,
-> designing experiments to test them, and then revising them based on data are
-> fundamental to scientific discovery. Despite the significant promise of
-> LLM-based scientific agents, no benchmarks systematically test LLM's ability to
-> propose scientific models, collect experimental data, and revise them in light
-> of new data. We introduce BoxingGym, a benchmark with 10 environments for
-> systematically evaluating both experimental design (e.g. collecting data to
-> test a scientific theory) and model discovery (e.g. proposing and revising
-> scientific theories). To enable tractable and quantitative evaluation, we
-> implement each environment as a generative probabilistic model with which a
-> scientific agent can run interactive experiments. These probabilistic models
-> are drawn from various real-world scientific domains ranging from psychology to
-> ecology. To quantitatively evaluate a scientific agent's ability to collect
-> informative experimental data, we compute the expected information gain (EIG),
-> an information-theoretic quantity which measures how much an experiment reduces
-> uncertainty about the parameters of a generative model. A good scientific
-> theory is a concise and predictive explanation. Therefore, to quantitatively
-> evaluate model discovery, we ask a scientific agent to explain their model and
-> then assess whether this explanation enables another scientific agent to make
-> reliable predictions about this environment. In addition to this
-> explanation-based evaluation, we compute standard model evaluation metrics such
-> as prediction errors. We find that current LLMs, such as GPT-4o, struggle with
-> both experimental design and model discovery. We find that augmenting the
-> LLM-based agent with an explicit statistical model does not reliably improve
-> these results.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on benchmarking scientific discovery and experimentation using LLMs, rather than specifically engineering or optimizing prompts for Large Language Models (LLMs). While LLMs are utilized, the core subject is the evaluation of scientific agents, not prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Constructing and explaining machine learning models for chemistry:
-  example of the exploration and design of boron-based Lewis acids](https://arxiv.org/abs/http://arxiv.org/abs/2501.01576v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01576v2
-
-**Abstract:**
-> The integration of machine learning (ML) into chemistry offers transformative
-> potential in the design of molecules with targeted properties. However, the
-> focus has often been on creating highly efficient predictive models, sometimes
-> at the expense of interpretability. In this study, we leverage explainable AI
-> techniques to explore the rational design of boron-based Lewis acids, which
-> play a pivotal role in organic reactions due to their electron-ccepting
-> properties. Using Fluoride Ion Affinity as a proxy for Lewis acidity, we
-> developed interpretable ML models based on chemically meaningful descriptors,
-> including ab initio computed features and substituent-based parameters derived
-> from the Hammett linear free-energy relationship. By constraining the chemical
-> space to well-defined molecular scaffolds, we achieved highly accurate
-> predictions (mean absolute error < 6 kJ/mol), surpassing conventional black-box
-> deep learning models in low-data regimes. Interpretability analyses of the
-> models shed light on the origin of Lewis acidity in these compounds and
-> identified actionable levers to modulate it through the nature and positioning
-> of substituents on the molecular scaffold. This work bridges ML and chemist's
-> way of thinking, demonstrating how explainable models can inspire molecular
-> design and enhance scientific understanding of chemical reactivity.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing interpretable machine learning models for chemistry applications, with no apparent emphasis on prompt engineering, Large Language Models (LLMs), or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering:
-  Methodology, Results, and Challenges](https://arxiv.org/abs/http://arxiv.org/abs/2501.01588v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01588v1
-
-**Abstract:**
-> Large Language Models (LLMs) have become essential tools across various
-> domains due to their impressive capabilities in understanding and generating
-> human-like text. The ability to accurately answer multiple-choice questions
-> (MCQs) holds significant value in education, particularly in automated tutoring
-> systems and assessment platforms. However, adapting LLMs to handle MCQ tasks
-> effectively remains challenging due to the hallucinations and unclear prompts.
-> This work explores the potential of Microsoft's PHI-3\cite{Abdin2024}, a
-> compact yet efficient LLM, for MCQ answering. Our contributions include
-> fine-tuning the model on the TruthfulQA dataset, designing optimized prompts to
-> enhance model performance, and evaluating using perplexity and traditional
-> metrics like accuracy and F1 score. Results show a remarkable improvement in
-> PHI-3.5's MCQ handling post-fine-tuning, with perplexity decreasing from 4.68
-> to 2.27, and accuracy rising from 62\% to 90.8\%. This research underlines the
-> importance of efficient models in adaptive learning systems and educational
-> assessments, paving the way for broader integration into the classroom,
-> particularly in fields like test preparation, student feedback, and
-> personalized learning.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on fine-tuning a Large Language Model (PHI-3) for multiple-choice question answering, which falls under the development of new training methods, contravening the 'MUST NOT' criterion 1. Although it mentions designing optimized prompts, this is secondary to the fine-tuning methodology, not meeting the core subject requirement of prompt engineering being the central focus."
-}
-
----
-
-## [Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible
-  User Perception](https://arxiv.org/abs/http://arxiv.org/abs/2501.01598v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01598v1
-
-**Abstract:**
-> A wide range of user perception applications leverage inertial measurement
-> unit (IMU) data for online prediction. However, restricted by the non-i.i.d.
-> nature of IMU data collected from mobile devices, most systems work well only
-> in a controlled setting (e.g., for a specific user in particular postures),
-> limiting application scenarios. To achieve uncontrolled online prediction on
-> mobile devices, referred to as the flexible user perception (FUP) problem, is
-> attractive but hard. In this paper, we propose a novel scheme, called Prism,
-> which can obtain high FUP accuracy on mobile devices. The core of Prism is to
-> discover task-aware domains embedded in IMU dataset, and to train a
-> domain-aware model on each identified domain. To this end, we design an
-> expectation-maximization (EM) algorithm to estimate latent domains with respect
-> to the specific downstream perception task. Finally, the best-fit model can be
-> automatically selected for use by comparing the test sample and all identified
-> domains in the feature space. We implement Prism on various mobile devices and
-> conduct extensive experiments. Results demonstrate that Prism can achieve the
-> best FUP performance with a low latency.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it proposes a scheme for improving user perception through task-aware domains in IMU data, with no mention of LLMs or prompt engineering for text-based interactions.
-
----
-
-## [Few-shot Implicit Function Generation via Equivariance](https://arxiv.org/abs/http://arxiv.org/abs/2501.01601v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01601v1
-
-**Abstract:**
-> Implicit Neural Representations (INRs) have emerged as a powerful framework
-> for representing continuous signals. However, generating diverse INR weights
-> remains challenging due to limited training data. We introduce Few-shot
-> Implicit Function Generation, a new problem setup that aims to generate diverse
-> yet functionally consistent INR weights from only a few examples. This is
-> challenging because even for the same signal, the optimal INRs can vary
-> significantly depending on their initializations. To tackle this, we propose
-> EquiGen, a framework that can generate new INRs from limited data. The core
-> idea is that functionally similar networks can be transformed into one another
-> through weight permutations, forming an equivariance group. By projecting these
-> weights into an equivariant latent space, we enable diverse generation within
-> these groups, even with few examples. EquiGen implements this through an
-> equivariant encoder trained via contrastive learning and smooth augmentation,
-> an equivariance-guided diffusion process, and controlled perturbations in the
-> equivariant subspace. Experiments on 2D image and 3D shape INR datasets
-> demonstrate that our approach effectively generates diverse INR weights while
-> preserving their functional properties in few-shot scenarios.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on generating Implicit Neural Representations (INRs) weights for continuous signal representation, not on the engineering, design, or optimization of prompts for Large Language Models (LLMs), failing to meet the primary 'MUST' criteria."
-}
-
----
-
-## [Google is all you need: Semi-Supervised Transfer Learning Strategy For
-  Light Multimodal Multi-Task Classification Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.01611v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01611v1
-
-**Abstract:**
-> As the volume of digital image data increases, the effectiveness of image
-> classification intensifies. This study introduces a robust multi-label
-> classification system designed to assign multiple labels to a single image,
-> addressing the complexity of images that may be associated with multiple
-> categories (ranging from 1 to 19, excluding 12). We propose a multi-modal
-> classifier that merges advanced image recognition algorithms with Natural
-> Language Processing (NLP) models, incorporating a fusion module to integrate
-> these distinct modalities. The purpose of integrating textual data is to
-> enhance the accuracy of label prediction by providing contextual understanding
-> that visual analysis alone cannot fully capture. Our proposed classification
-> model combines Convolutional Neural Networks (CNN) for image processing with
-> NLP techniques for analyzing textual description (i.e., captions). This
-> approach includes rigorous training and validation phases, with each model
-> component verified and analyzed through ablation experiments. Preliminary
-> results demonstrate the classifier's accuracy and efficiency, highlighting its
-> potential as an automatic image-labeling system.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a multi-modal classification model for image classification, integrating NLP for contextual understanding, rather than specifically on the engineering, design, or optimization of prompts for Large Language Models (LLMs).
-
----
-
-## [Merging Context Clustering with Visual State Space Models for Medical
-  Image Segmentation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01618v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01618v1
-
-**Abstract:**
-> Medical image segmentation demands the aggregation of global and local
-> feature representations, posing a challenge for current methodologies in
-> handling both long-range and short-range feature interactions. Recently, vision
-> mamba (ViM) models have emerged as promising solutions for addressing model
-> complexities by excelling in long-range feature iterations with linear
-> complexity. However, existing ViM approaches overlook the importance of
-> preserving short-range local dependencies by directly flattening spatial tokens
-> and are constrained by fixed scanning patterns that limit the capture of
-> dynamic spatial context information. To address these challenges, we introduce
-> a simple yet effective method named context clustering ViM (CCViM), which
-> incorporates a context clustering module within the existing ViM models to
-> segment image tokens into distinct windows for adaptable local clustering. Our
-> method effectively combines long-range and short-range feature interactions,
-> thereby enhancing spatial contextual representations for medical image
-> segmentation tasks. Extensive experimental evaluations on diverse public
-> datasets, i.e., Kumar, CPM17, ISIC17, ISIC18, and Synapse demonstrate the
-> superior performance of our method compared to current state-of-the-art
-> methods. Our code can be found at https://github.com/zymissy/CCViM.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on medical image segmentation, incorporating visual state space models, and does not address prompt engineering for Large Language Models (LLMs) or demonstrate the impact of textual input prompts on LLM output, thus failing to meet the 'MUST' criteria.
-
----
-
-## [Implications of Artificial Intelligence on Health Data Privacy and
-  Confidentiality](https://arxiv.org/abs/http://arxiv.org/abs/2501.01639v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01639v2
-
-**Abstract:**
-> The rapid integration of artificial intelligence (AI) in healthcare is
-> revolutionizing medical diagnostics, personalized medicine, and operational
-> efficiency. However, alongside these advancements, significant challenges arise
-> concerning patient data privacy, ethical considerations, and regulatory
-> compliance. This paper examines the dual impact of AI on healthcare,
-> highlighting its transformative potential and the critical need for
-> safeguarding sensitive health information. It explores the role of the Health
-> Insurance Portability and Accountability Act (HIPAA) as a regulatory framework
-> for ensuring data privacy and security, emphasizing the importance of robust
-> safeguards and ethical standards in AI-driven healthcare. Through case studies,
-> including AI applications in diabetic retinopathy, oncology, and the
-> controversies surrounding data sharing, this study underscores the ethical and
-> legal complexities of AI implementation. A balanced approach that fosters
-> innovation while maintaining patient trust and privacy is imperative. The
-> findings emphasize the importance of continuous education, transparency, and
-> adherence to regulatory frameworks to harness AI's full potential responsibly
-> and ethically in healthcare.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on the implications of AI on health data privacy and confidentiality, rather than on the engineering, design, or optimization of prompts for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria for focusing on prompt engineering for LLMs.
-
----
-
-## [HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long
-  Video Understanding](https://arxiv.org/abs/http://arxiv.org/abs/2501.01645v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01645v1
-
-**Abstract:**
-> Multimodal large language models have become a popular topic in deep visual
-> understanding due to many promising real-world applications. However, hour-long
-> video understanding, spanning over one hour and containing tens of thousands of
-> visual frames, remains under-explored because of 1) challenging long-term video
-> analyses, 2) inefficient large-model approaches, and 3) lack of large-scale
-> benchmark datasets. Among them, in this paper, we focus on building a
-> large-scale hour-long long video benchmark, HLV-1K, designed to evaluate long
-> video understanding models. HLV-1K comprises 1009 hour-long videos with 14,847
-> high-quality question answering (QA) and multi-choice question asnwering (MCQA)
-> pairs with time-aware query and diverse annotations, covering frame-level,
-> within-event-level, cross-event-level, and long-term reasoning tasks. We
-> evaluate our benchmark using existing state-of-the-art methods and demonstrate
-> its value for testing deep long video understanding capabilities at different
-> levels and for various tasks. This includes promoting future long video
-> understanding tasks at a granular level, such as deep understanding of long
-> live videos, meeting recordings, and movies.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on building a benchmark dataset for hour-long video understanding, involving multimodal large language models, but does not concentrate on prompt engineering for text-based interactions with LLMs, instead emphasizing video analysis and understanding."
-}
-
----
-
-## [AVATAR: Adversarial Autoencoders with Autoregressive Refinement for Time
-  Series Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01649v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01649v1
-
-**Abstract:**
-> Data augmentation can significantly enhance the performance of machine
-> learning tasks by addressing data scarcity and improving generalization.
-> However, generating time series data presents unique challenges. A model must
-> not only learn a probability distribution that reflects the real data
-> distribution but also capture the conditional distribution at each time step to
-> preserve the inherent temporal dependencies. To address these challenges, we
-> introduce AVATAR, a framework that combines Adversarial Autoencoders (AAE) with
-> Autoregressive Learning to achieve both objectives. Specifically, our technique
-> integrates the autoencoder with a supervisor and introduces a novel supervised
-> loss to assist the decoder in learning the temporal dynamics of time series
-> data. Additionally, we propose another innovative loss function, termed
-> distribution loss, to guide the encoder in more efficiently aligning the
-> aggregated posterior of the autoencoder's latent representation with a prior
-> Gaussian distribution. Furthermore, our framework employs a joint training
-> mechanism to simultaneously train all networks using a combined loss, thereby
-> fulfilling the dual objectives of time series generation. We evaluate our
-> technique across a variety of time series datasets with diverse
-> characteristics. Our experiments demonstrate significant improvements in both
-> the quality and practical utility of the generated data, as assessed by various
-> qualitative and quantitative metrics.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a new framework (AVATAR) for time series generation using adversarial autoencoders and autoregressive learning, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [EAUWSeg: Eliminating annotation uncertainty in weakly-supervised medical
-  image segmentation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01658v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01658v1
-
-**Abstract:**
-> Weakly-supervised medical image segmentation is gaining traction as it
-> requires only rough annotations rather than accurate pixel-to-pixel labels,
-> thereby reducing the workload for specialists. Although some progress has been
-> made, there is still a considerable performance gap between the label-efficient
-> methods and fully-supervised one, which can be attributed to the uncertainty
-> nature of these weak labels. To address this issue, we propose a novel weak
-> annotation method coupled with its learning framework EAUWSeg to eliminate the
-> annotation uncertainty. Specifically, we first propose the Bounded Polygon
-> Annotation (BPAnno) by simply labeling two polygons for a lesion. Then, the
-> tailored learning mechanism that explicitly treat bounded polygons as two
-> separated annotations is proposed to learn invariant feature by providing
-> adversarial supervision signal for model training. Subsequently, a
-> confidence-auxiliary consistency learner incorporates with a
-> classification-guided confidence generator is designed to provide reliable
-> supervision signal for pixels in uncertain region by leveraging the feature
-> presentation consistency across pixels within the same category as well as
-> class-specific information encapsulated in bounded polygons annotation.
-> Experimental results demonstrate that EAUWSeg outperforms existing
-> weakly-supervised segmentation methods. Furthermore, compared to
-> fully-supervised counterparts, the proposed method not only delivers superior
-> performance but also costs much less annotation workload. This underscores the
-> superiority and effectiveness of our approach.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on medical image segmentation, addressing annotation uncertainty in weakly-supervised learning, which violates the 'MUST NOT' criteria of being primarily concerned with medical subjects and not focusing on prompt engineering for Large Language Models (LLMs).
-
----
-
-## [BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat
-  Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.01664v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01664v1
-
-**Abstract:**
-> The integration of Internet of Things (IoT) technology in various domains has
-> led to operational advancements, but it has also introduced new vulnerabilities
-> to cybersecurity threats, as evidenced by recent widespread cyberattacks on IoT
-> devices. Intrusion detection systems are often reactive, triggered by specific
-> patterns or anomalies observed within the network. To address this challenge,
-> this work proposes a proactive approach to anticipate and preemptively mitigate
-> malicious activities, aiming to prevent potential damage before it occurs. This
-> paper proposes an innovative intrusion prediction framework empowered by
-> Pre-trained Large Language Models (LLMs). The framework incorporates two LLMs:
-> a fine-tuned Bidirectional and AutoRegressive Transformers (BART) model for
-> predicting network traffic and a fine-tuned Bidirectional Encoder
-> Representations from Transformers (BERT) model for evaluating the predicted
-> traffic. By harnessing the bidirectional capabilities of BART the framework
-> then identifies malicious packets among these predictions. Evaluated using the
-> CICIoT2023 IoT attack dataset, our framework showcases a notable enhancement in
-> predictive performance, attaining an impressive 98% overall accuracy, providing
-> a powerful response to the cybersecurity challenges that confront IoT networks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing an intrusion prediction framework for IoT security using fine-tuned LLMs, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models. Prompt engineering is not the central concern, but rather a means to achieve the framework's predictive capabilities.
-
----
-
-## [VidFormer: A novel end-to-end framework fused by 3DCNN and Transformer
-  for Video-based Remote Physiological Measurement](https://arxiv.org/abs/http://arxiv.org/abs/2501.01691v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01691v2
-
-**Abstract:**
-> Remote physiological signal measurement based on facial videos, also known as
-> remote photoplethysmography (rPPG), involves predicting changes in facial
-> vascular blood flow from facial videos. While most deep learning-based methods
-> have achieved good results, they often struggle to balance performance across
-> small and large-scale datasets due to the inherent limitations of convolutional
-> neural networks (CNNs) and Transformer. In this paper, we introduce VidFormer,
-> a novel end-to-end framework that integrates 3-Dimension Convolutional Neural
-> Network (3DCNN) and Transformer models for rPPG tasks. Initially, we conduct an
-> analysis of the traditional skin reflection model and subsequently introduce an
-> enhanced model for the reconstruction of rPPG signals. Based on this improved
-> model, VidFormer utilizes 3DCNN and Transformer to extract local and global
-> features from input data, respectively. To enhance the spatiotemporal feature
-> extraction capabilities of VidFormer, we incorporate temporal-spatial attention
-> mechanisms tailored for both 3DCNN and Transformer. Additionally, we design a
-> module to facilitate information exchange and fusion between the 3DCNN and
-> Transformer. Our evaluation on five publicly available datasets demonstrates
-> that VidFormer outperforms current state-of-the-art (SOTA) methods. Finally, we
-> discuss the essential roles of each VidFormer module and examine the effects of
-> ethnicity, makeup, and exercise on its performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a novel end-to-end framework for video-based remote physiological measurement, utilizing 3DCNN and Transformer, with no primary focus on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or improving LLM performance through textual input prompts.
-
----
-
-## [The Essence of Contextual Understanding in Theory of Mind: A Study on
-  Question Answering with Story Characters](https://arxiv.org/abs/http://arxiv.org/abs/2501.01705v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01705v1
-
-**Abstract:**
-> Theory-of-Mind (ToM) is a fundamental psychological capability that allows
-> humans to understand and interpret the mental states of others. Humans infer
-> others' thoughts by integrating causal cues and indirect clues from broad
-> contextual information, often derived from past interactions. In other words,
-> human ToM heavily relies on the understanding about the backgrounds and life
-> stories of others. Unfortunately, this aspect is largely overlooked in existing
-> benchmarks for evaluating machines' ToM capabilities, due to their usage of
-> short narratives without global backgrounds. In this paper, we verify the
-> importance of understanding long personal backgrounds in ToM and assess the
-> performance of LLMs in such realistic evaluation scenarios. To achieve this, we
-> introduce a novel benchmark, CharToM-QA, comprising 1,035 ToM questions based
-> on characters from classic novels. Our human study reveals a significant
-> disparity in performance: the same group of educated participants performs
-> dramatically better when they have read the novels compared to when they have
-> not. In parallel, our experiments on state-of-the-art LLMs, including the very
-> recent o1 model, show that LLMs still perform notably worse than humans,
-> despite that they have seen these stories during pre-training. This highlights
-> the limitations of current LLMs in capturing the nuanced contextual information
-> required for ToM reasoning.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on evaluating Large Language Models' (LLMs) performance in Theory of Mind tasks using a novel benchmark, rather than on the engineering, design, or optimization of prompts specifically for LLMs, as required by the criteria."
-}
-
----
-
-## [MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders](https://arxiv.org/abs/http://arxiv.org/abs/2501.01709v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01709v1
-
-**Abstract:**
-> Visual encoders are fundamental components in vision-language models (VLMs),
-> each showcasing unique strengths derived from various pre-trained visual
-> foundation models. To leverage the various capabilities of these encoders,
-> recent studies incorporate multiple encoders within a single VLM, leading to a
-> considerable increase in computational cost. In this paper, we present
-> Mixture-of-Visual-Encoder Knowledge Distillation (MoVE-KD), a novel framework
-> that distills the unique proficiencies of multiple vision encoders into a
-> single, efficient encoder model. Specifically, to mitigate conflicts and retain
-> the unique characteristics of each teacher encoder, we employ low-rank
-> adaptation (LoRA) and mixture-of-experts (MoEs) to selectively activate
-> specialized knowledge based on input features, enhancing both adaptability and
-> efficiency. To regularize the KD process and enhance performance, we propose an
-> attention-based distillation strategy that adaptively weighs the different
-> visual encoders and emphasizes valuable visual tokens, reducing the burden of
-> replicating comprehensive but distinct features from multiple teachers.
-> Comprehensive experiments on popular VLMs, such as LLaVA and LLaVA-NeXT,
-> validate the effectiveness of our method. The code will be released.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on knowledge distillation for Vision-Language Models (VLMs) with multiple visual encoders, which does not meet the criteria of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User
-  Queries](https://arxiv.org/abs/http://arxiv.org/abs/2501.01711v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01711v1
-
-**Abstract:**
-> The paper presents a preliminary analysis of an experiment conducted by Frank
-> Bold, a Czech expert group, to explore user interactions with GPT-4 for
-> addressing legal queries. Between May 3, 2023, and July 25, 2023, 1,252 users
-> submitted 3,847 queries. Unlike studies that primarily focus on the accuracy,
-> factuality, or hallucination tendencies of large language models (LLMs), our
-> analysis focuses on the user query dimension of the interaction. Using GPT-4o
-> for zero-shot classification, we categorized queries on (1) whether users
-> provided factual information about their issue (29.95%) or not (70.05%), (2)
-> whether they sought legal information (64.93%) or advice on the course of
-> action (35.07\%), and (3) whether they imposed requirements to shape or control
-> the model's answer (28.57%) or not (71.43%). We provide both quantitative and
-> qualitative insight into user needs and contribute to a better understanding of
-> user engagement with LLMs.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on understanding user needs and engagement with LLMs for legal queries, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), as required by the criteria."
-}
-
----
-
-## [Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal
-  Reinforcement Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01727v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01727v1
-
-**Abstract:**
-> Humanoid robots must master numerous tasks with sparse rewards, posing a
-> challenge for reinforcement learning (RL). We propose a method combining RL and
-> automated planning to address this. Our approach uses short goal-conditioned
-> policies (GCPs) organized hierarchically, with Monte Carlo Tree Search (MCTS)
-> planning using high-level actions (HLAs). Instead of primitive actions, the
-> planning process generates HLAs. A single plan-tree, maintained during the
-> agent's lifetime, holds knowledge about goal achievement. This hierarchy
-> enhances sample efficiency and speeds up reasoning by reusing HLAs and
-> anticipating future actions. Our Hierarchical Goal-Conditioned Policy Planning
-> (HGCPP) framework uniquely integrates GCPs, MCTS, and hierarchical RL,
-> potentially improving exploration and planning in complex tasks.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on multi-goal reinforcement learning for humanoid robots, using techniques like Monte Carlo Tree Search and hierarchical RL, with no apparent connection to Large Language Models (LLMs) or prompt engineering for text-based interactions."
-}
-
----
-
-## [Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item
-  Detection under Noisy Annotations](https://arxiv.org/abs/http://arxiv.org/abs/2501.01733v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01733v1
-
-**Abstract:**
-> Automatic X-ray prohibited item detection is vital for public safety.
-> Existing deep learning-based methods all assume that the annotations of
-> training X-ray images are correct. However, obtaining correct annotations is
-> extremely hard if not impossible for large-scale X-ray images, where item
-> overlapping is ubiquitous.As a result, X-ray images are easily contaminated
-> with noisy annotations, leading to performance deterioration of existing
-> methods.In this paper, we address the challenging problem of training a robust
-> prohibited item detector under noisy annotations (including both category noise
-> and bounding box noise) from a novel perspective of data augmentation, and
-> propose an effective label-aware mixed patch paste augmentation method
-> (Mix-Paste). Specifically, for each item patch, we mix several item patches
-> with the same category label from different images and replace the original
-> patch in the image with the mixed patch. In this way, the probability of
-> containing the correct prohibited item within the generated image is increased.
-> Meanwhile, the mixing process mimics item overlapping, enabling the model to
-> learn the characteristics of X-ray images. Moreover, we design an item-based
-> large-loss suppression (LLS) strategy to suppress the large losses
-> corresponding to potentially positive predictions of additional items due to
-> the mixing operation. We show the superiority of our method on X-ray datasets
-> under noisy annotations. In addition, we evaluate our method on the noisy
-> MS-COCO dataset to showcase its generalization ability. These results clearly
-> indicate the great potential of data augmentation to handle noise annotations.
-> The source code is released at https://github.com/wscds/Mix-Paste.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on data augmentation for improving prohibited item detection in X-ray images using deep learning, with no primary concern on prompt engineering, Large Language Models (LLMs), or textual input prompts."
-}
-
----
-
-## [Automating Legal Concept Interpretation with LLMs: Retrieval,
-  Generation, and Evaluation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01743v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01743v1
-
-**Abstract:**
-> Legal articles often include vague concepts to adapt to the ever-changing
-> society. Providing detailed interpretations of these concepts is a critical
-> task for legal practitioners, which requires meticulous and professional
-> annotations by legal experts, admittedly time-consuming and expensive to
-> collect at scale. In this paper, we introduce a novel retrieval-augmented
-> generation framework, ATRI, for AuTomatically Retrieving relevant information
-> from past judicial precedents and Interpreting vague legal concepts. We further
-> propose a new benchmark, Legal Concept Entailment, to automate the evaluation
-> of generated concept interpretations without expert involvement. Automatic
-> evaluations indicate that our generated interpretations can effectively assist
-> large language models (LLMs) in understanding vague legal concepts.
-> Multi-faceted evaluations by legal experts indicate that the quality of our
-> concept interpretations is comparable to those written by human experts. Our
-> work has strong implications for leveraging LLMs to support legal practitioners
-> in interpreting vague legal concepts and beyond.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing a retrieval-augmented generation framework for legal concept interpretation, leveraging LLMs as a component, rather than focusing specifically on prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Creating Artificial Students that Never Existed: Leveraging Large
-  Language Models and CTGANs for Synthetic Data Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01793v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01793v1
-
-**Abstract:**
-> In this study, we explore the growing potential of AI and deep learning
-> technologies, particularly Generative Adversarial Networks (GANs) and Large
-> Language Models (LLMs), for generating synthetic tabular data. Access to
-> quality students data is critical for advancing learning analytics, but privacy
-> concerns and stricter data protection regulations worldwide limit their
-> availability and usage. Synthetic data offers a promising alternative. We
-> investigate whether synthetic data can be leveraged to create artificial
-> students for serving learning analytics models. Using the popular GAN model
-> CTGAN and three LLMs- GPT2, DistilGPT2, and DialoGPT, we generate synthetic
-> tabular student data. Our results demonstrate the strong potential of these
-> methods to produce high-quality synthetic datasets that resemble real students
-> data. To validate our findings, we apply a comprehensive set of utility
-> evaluation metrics to assess the statistical and predictive performance of the
-> synthetic data and compare the different generator models used, specially the
-> performance of LLMs. Our study aims to provide the learning analytics community
-> with valuable insights into the use of synthetic data, laying the groundwork
-> for expanding the field methodological toolbox with new innovative approaches
-> for learning analytics data generation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on generating synthetic tabular data for learning analytics using GANs and LLMs, with LLMs being used for data generation rather than text-based interactions through prompt engineering, thus not meeting the core subject requirement.
-
----
-
-## [End-to-End Long Document Summarization using Gradient Caching](https://arxiv.org/abs/http://arxiv.org/abs/2501.01805v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01805v1
-
-**Abstract:**
-> Training transformer-based encoder-decoder models for long document
-> summarization poses a significant challenge due to the quadratic memory
-> consumption during training. Several approaches have been proposed to extend
-> the input length at test time, but training with these approaches is still
-> difficult, requiring truncation of input documents and causing a mismatch
-> between training and test conditions. In this work, we propose CachED (Gradient
-> $\textbf{Cach}$ing for $\textbf{E}$ncoder-$\textbf{D}$ecoder models), an
-> approach that enables end-to-end training of existing transformer-based
-> encoder-decoder models, using the entire document without truncation.
-> Specifically, we apply non-overlapping sliding windows to input documents,
-> followed by fusion in decoder. During backpropagation, the gradients are cached
-> at the decoder and are passed through the encoder in chunks by re-computing the
-> hidden vectors, similar to gradient checkpointing. In the experiments on long
-> document summarization, we extend BART to CachED BART, processing more than
-> 500K tokens during training and achieving superior performance without using
-> any additional parameters.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing an optimization technique (Gradient Caching) for training transformer-based encoder-decoder models for long document summarization, rather than on engineering, designing, or optimizing prompts specifically for Large Language Models (LLMs).
-
----
-
-## [SDPO: Segment-Level Direct Preference Optimization for Social Agents](https://arxiv.org/abs/http://arxiv.org/abs/2501.01821v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01821v1
-
-**Abstract:**
-> Social agents powered by large language models (LLMs) can simulate human
-> social behaviors but fall short in handling complex goal-oriented social
-> dialogues. Direct Preference Optimization (DPO) has proven effective in
-> aligning LLM behavior with human preferences across a variety of agent tasks.
-> Existing DPO-based approaches for multi-turn interactions are divided into
-> turn-level and session-level methods. The turn-level method is overly
-> fine-grained, focusing exclusively on individual turns, while session-level
-> methods are too coarse-grained, often introducing training noise. To address
-> these limitations, we propose Segment-Level Direct Preference Optimization
-> (SDPO), which focuses on specific key segments within interactions to optimize
-> multi-turn agent behavior while minimizing training noise. Evaluations on the
-> SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform
-> both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring
-> SDPO's potential to advance the social intelligence of LLM-based agents. We
-> release our code and data at
-> https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on optimizing the behavior of social agents powered by LLMs through a new method (SDPO), but does not primarily investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts. Instead, it optimizes agent behavior through preference optimization, making prompt engineering not the central focus.
-
----
-
-## [The Proof is in the Almond Cookies](https://arxiv.org/abs/http://arxiv.org/abs/2501.01827v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01827v1
-
-**Abstract:**
-> This paper presents a case study on how to process cooking recipes (and more
-> generally, how-to instructions) in a way that makes it possible for a robot or
-> artificial cooking assistant to support human chefs in the kitchen. Such AI
-> assistants would be of great benefit to society, as they can help to sustain
-> the autonomy of aging adults or people with a physical impairment, or they may
-> reduce the stress in a professional kitchen. We propose a novel approach to
-> computational recipe understanding that mimics the human sense-making process,
-> which is narrative-based. Using an English recipe for almond crescent cookies
-> as illustration, we show how recipes can be modelled as rich narrative
-> structures by integrating various knowledge sources such as language
-> processing, ontologies, and mental simulation. We show how such narrative
-> structures can be used for (a) dealing with the challenges of recipe language,
-> such as zero anaphora, (b) optimizing a robot's planning process, (c) measuring
-> how well an AI system understands its current tasks, and (d) allowing recipe
-> annotations to become language-independent.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on computational recipe understanding for robot-assisted cooking, not on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not demonstrate the impact of textual input prompts on LLM output.
-
----
-
-## [MoColl: Agent-Based Specific and General Model Collaboration for Image
-  Captioning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01834v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01834v2
-
-**Abstract:**
-> Image captioning is a critical task at the intersection of computer vision
-> and natural language processing, with wide-ranging applications across various
-> domains. For complex tasks such as diagnostic report generation, deep learning
-> models require not only domain-specific image-caption datasets but also the
-> incorporation of relevant general knowledge to provide contextual accuracy.
-> Existing approaches exhibit inherent limitations: specialized models excel in
-> capturing domain-specific details but lack generalization, while
-> vision-language models (VLMs) built on large language models (LLMs) leverage
-> general knowledge but struggle with domain-specific adaptation. To address
-> these limitations, this paper proposes a novel agent-enhanced model
-> collaboration framework, which we call MoColl, designed to effectively
-> integrate domain-specific and general knowledge. Specifically, our approach is
-> to decompose complex image captioning tasks into a series of interconnected
-> question-answer subtasks. A trainable visual question answering (VQA) model is
-> employed as a specialized tool to focus on domain-specific visual analysis,
-> answering task-specific questions based on image content. Concurrently, an
-> LLM-based agent with general knowledge formulates these questions and
-> synthesizes the resulting question-answer pairs into coherent captions. Beyond
-> its role in leveraging the VQA model, the agent further guides its training to
-> enhance its domain-specific capabilities. Experimental results on radiology
-> report generation validate the effectiveness of the proposed framework,
-> demonstrating significant improvements in the quality of generated reports.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a collaborative framework for image captioning, leveraging both vision-language models (VLMs) and Large Language Models (LLMs), but does not centralize on prompt engineering for text-based interactions with LLMs. The LLM's role is in question formulation and synthesis, not in prompt engineering for improved LLM output.
-
----
-
-## [Practical machine learning is learning on small samples](https://arxiv.org/abs/http://arxiv.org/abs/2501.01836v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01836v1
-
-**Abstract:**
-> Based on limited observations, machine learning discerns a dependence which
-> is expected to hold in the future. What makes it possible? Statistical learning
-> theory imagines indefinitely increasing training sample to justify its
-> approach. In reality, there is no infinite time or even infinite general
-> population for learning. Here I argue that practical machine learning is based
-> on an implicit assumption that underlying dependence is relatively ``smooth" :
-> likely, there are no abrupt differences in feedback between cases with close
-> data points. From this point of view learning shall involve selection of the
-> hypothesis ``smoothly" approximating the training set. I formalize this as
-> Practical learning paradigm. The paradigm includes terminology and rules for
-> description of learners. Popular learners (local smoothing, k-NN, decision
-> trees, Naive Bayes, SVM for classification and for regression) are shown here
-> to be implementations of this paradigm.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria: it does not focus on prompt engineering for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompts. Instead, it discusses a general machine learning paradigm for learning from small samples, unrelated to LLMs or prompt engineering.
-
----
-
-## [Multi-Agent Conversational Online Learning for Adaptive LLM Response
-  Identification](https://arxiv.org/abs/http://arxiv.org/abs/2501.01849v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01849v1
-
-**Abstract:**
-> The remarkable generative capability of large language models (LLMs) has
-> sparked a growing interest in automatically generating responses for different
-> applications. Given the dynamic nature of user preferences and the uncertainty
-> of LLM response performance, it is crucial to design efficient online learning
-> algorithms to identify optimal LLM responses (i.e., high-quality responses that
-> also meet user preferences). Most existing online algorithms adopt a
-> centralized approach and fail to leverage explicit user preferences for more
-> efficient and personalized LLM response identification. In contrast, this paper
-> introduces \textit{MACO} (\underline{M}ulti-\underline{A}gent
-> \underline{C}onversational \underline{O}nline Learning for Adaptive LLM
-> Response Identification): 1) The online LLM response identification process is
-> accelerated by multiple local agents (such as smartphones), while enhancing
-> data privacy; 2) A novel conversational mechanism is proposed to adaptively
-> conduct conversations for soliciting user preferences (e.g., a preference for a
-> humorous tone over a serious one in generated responses), so to minimize
-> uncertainty in preference estimation. Our theoretical analysis demonstrates
-> that \cadi\ is near-optimal regarding cumulative regret. Additionally, \cadi\
-> offers reduced communication costs and computational complexity by eliminating
-> the traditional, computing-intensive ``G-optimal design" found in previous
-> works. Extensive experiments with the open LLM \textit{Llama}, coupled with two
-> different embedding models from Google and OpenAI for text vector
-> representation, demonstrate that \cadi\ significantly outperforms the current
-> state-of-the-art in online LLM response identification.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on multi-agent conversational online learning for adapting LLM responses, not on prompt engineering specifically for Large Language Models (LLMs). It fails to meet the 'MUST' criteria by not investigating, analyzing, or proposing methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
-## [Accuracy Can Lie: On the Impact of Surrogate Model in Configuration
-  Tuning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01876v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01876v1
-
-**Abstract:**
-> To ease the expensive measurements during configuration tuning, it is natural
-> to build a surrogate model as the replacement of the system, and thereby the
-> configuration performance can be cheaply evaluated. Yet, a stereotype therein
-> is that the higher the model accuracy, the better the tuning result would be.
-> This "accuracy is all" belief drives our research community to build more and
-> more accurate models and criticize a tuner for the inaccuracy of the model
-> used. However, this practice raises some previously unaddressed questions,
-> e.g., Do those somewhat small accuracy improvements reported in existing work
-> really matter much to the tuners? What role does model accuracy play in the
-> impact of tuning quality? To answer those related questions, we conduct one of
-> the largest-scale empirical studies to date-running over the period of 13
-> months 24*7-that covers 10 models, 17 tuners, and 29 systems from the existing
-> works while under four different commonly used metrics, leading to 13,612 cases
-> of investigation. Surprisingly, our key findings reveal that the accuracy can
-> lie: there are a considerable number of cases where higher accuracy actually
-> leads to no improvement in the tuning outcomes (up to 58% cases under certain
-> setting), or even worse, it can degrade the tuning quality (up to 24% cases
-> under certain setting). We also discover that the chosen models in most
-> proposed tuners are sub-optimal and that the required % of accuracy change to
-> significantly improve tuning quality varies according to the range of model
-> accuracy. Deriving from the fitness landscape analysis, we provide in-depth
-> discussions of the rationale behind, offering several lessons learned as well
-> as insights for future opportunities. Most importantly, this work poses a clear
-> message to the community: we should take one step back from the natural
-> "accuracy is all" belief for model-based configuration tuning.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on the impact of surrogate model accuracy in configuration tuning, which does not meet the 'MUST' criteria of primarily focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Virgo: A Preliminary Exploration on Reproducing o1-like MLLM](https://arxiv.org/abs/http://arxiv.org/abs/2501.01904v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01904v1
-
-**Abstract:**
-> Recently, slow-thinking reasoning systems, built upon large language models
-> (LLMs), have garnered widespread attention by scaling the thinking time during
-> inference. There is also growing interest in adapting this capability to
-> multimodal large language models (MLLMs). Given that MLLMs handle more complex
-> data semantics across different modalities, it is intuitively more challenging
-> to implement multimodal slow-thinking systems.
-> To address this issue, in this paper, we explore a straightforward approach
-> by fine-tuning a capable MLLM with a small amount of textual long-form thought
-> data, resulting in a multimodal slow-thinking system, Virgo (Visual reasoning
-> with long thought). We find that these long-form reasoning processes, expressed
-> in natural language, can be effectively transferred to MLLMs. Moreover, it
-> seems that such textual reasoning data can be even more effective than visual
-> reasoning data in eliciting the slow-thinking capacities of MLLMs. While this
-> work is preliminary, it demonstrates that slow-thinking capacities are
-> fundamentally associated with the language model component, which can be
-> transferred across modalities or domains. This finding can be leveraged to
-> guide the development of more powerful slow-thinking reasoning systems. We
-> release our resources at https://github.com/RUCAIBox/Virgo.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on fine-tuning a multimodal large language model (MLLM) for slow-thinking reasoning, which falls under developing new training methods for LLM architectures, rather than engineering prompts specifically for improving LLM performance through textual input manipulation.
-
----
-
-## [Mitigating Hallucination for Large Vision Language Model by
-  Inter-Modality Correlation Calibration Decoding](https://arxiv.org/abs/http://arxiv.org/abs/2501.01926v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01926v1
-
-**Abstract:**
-> Large vision-language models (LVLMs) have shown remarkable capabilities in
-> visual-language understanding for downstream multi-modal tasks. Despite their
-> success, LVLMs still suffer from generating hallucinations in complex
-> generation tasks, leading to inconsistencies between visual inputs and
-> generated content. To address this issue, some approaches have introduced
-> inference-time interventions, such as contrastive decoding and attention
-> rectification, to reduce overreliance on language priors. However, these
-> approaches overlook hallucinations stemming from spurious inter-modality
-> correlations. In this paper, we propose an Inter-Modality Correlation
-> Calibration Decoding (IMCCD) method to mitigate hallucinations in LVLMs in a
-> training-free manner. In this method, we design a Cross-Modal Value-Enhanced
-> Decoding(CMVED) module to alleviate hallucination by a novel contrastive
-> decoding mechanism. During the estimation of distorted distribution, CMVED
-> masks the value vectors associated with significant cross-modal attention
-> weights, which address both uni-modality overreliance and misleading
-> inter-modality correlations. Additionally, a Content-Driven Attention
-> Refinement(CDAR) module refines cross-modal attention weights, guiding LVLMs to
-> focus on important visual content. Experimental results on diverse
-> hallucination benchmarks validate the superiority of our method over existing
-> state-of-the-art techniques in reducing hallucinations in LVLM text generation.
-> Our code will be available at https://github.com/lijm48/IMCCD.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on mitigating hallucinations in Large Vision Language Models (LVLMs) through decoding mechanisms, not on the engineering, design, or optimization of textual input prompts for Large Language Models (LLMs), and does not provide concrete examples of prompt manipulation impacting LLM output."
-}
-
----
-
-## [Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues
-  and Challenges](https://arxiv.org/abs/http://arxiv.org/abs/2501.01933v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01933v1
-
-**Abstract:**
-> This thesis presents Abstractive Text Summarization models for contemporary
-> Sanskrit prose. The first chapter, titled Introduction, presents the motivation
-> behind this work, the research questions, and the conceptual framework.
-> Sanskrit is a low-resource inflectional language. The key research question
-> that this thesis investigates is what the challenges in developing an
-> abstractive TS for Sanskrit. To answer the key research questions,
-> sub-questions based on four different themes have been posed in this work. The
-> second chapter, Literature Review, surveys the previous works done. The third
-> chapter, data preparation, answers the remaining three questions from the third
-> theme. It reports the data collection and preprocessing challenges for both
-> language model and summarization model trainings. The fourth chapter reports
-> the training and inference of models and the results obtained therein. This
-> research has initiated a pipeline for Sanskrit abstractive text summarization
-> and has reported the challenges faced at every stage of the development. The
-> research questions based on every theme have been answered to answer the key
-> research question.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing Abstractive Text Summarization models for Sanskrit prose, discussing language-specific challenges and model training, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Cold-Start Recommendation towards the Era of Large Language Models
-  (LLMs): A Comprehensive Survey and Roadmap](https://arxiv.org/abs/http://arxiv.org/abs/2501.01945v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01945v2
-
-**Abstract:**
-> Cold-start problem is one of the long-standing challenges in recommender
-> systems, focusing on accurately modeling new or interaction-limited users or
-> items to provide better recommendations. Due to the diversification of internet
-> platforms and the exponential growth of users and items, the importance of
-> cold-start recommendation (CSR) is becoming increasingly evident. At the same
-> time, large language models (LLMs) have achieved tremendous success and possess
-> strong capabilities in modeling user and item information, providing new
-> potential for cold-start recommendations. However, the research community on
-> CSR still lacks a comprehensive review and reflection in this field. Based on
-> this, in this paper, we stand in the context of the era of large language
-> models and provide a comprehensive review and discussion on the roadmap,
-> related literature, and future directions of CSR. Specifically, we have
-> conducted an exploration of the development path of how existing CSR utilizes
-> information, from content features, graph relations, and domain information, to
-> the world knowledge possessed by large language models, aiming to provide new
-> insights for both the research and industrial communities on CSR. Related
-> resources of cold-start recommendations are collected and continuously updated
-> for the community in
-> https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on cold-start recommendation systems utilizing Large Language Models (LLMs), but does not concentrate on prompt engineering for text-based interactions with LLMs. Instead, it leverages LLMs as a component for enhancing recommender systems, which does not meet the core subject requirement of prompt engineering being the central focus.
-
----
-
-## [MADGEN: Mass-Spec attends to De Novo Molecular generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01950v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01950v2
-
-**Abstract:**
-> The annotation (assigning structural chemical identities) of MS/MS spectra
-> remains a significant challenge due to the enormous molecular diversity in
-> biological samples and the limited scope of reference databases. Currently, the
-> vast majority of spectral measurements remain in the "dark chemical space"
-> without structural annotations. To improve annotation, we propose MADGEN
-> (Mass-spec Attends to De Novo Molecular GENeration), a scaffold-based method
-> for de novo molecular structure generation guided by mass spectrometry data.
-> MADGEN operates in two stages: scaffold retrieval and spectra-conditioned
-> molecular generation starting with the scaffold. In the first stage, given an
-> MS/MS spectrum, we formulate scaffold retrieval as a ranking problem and employ
-> contrastive learning to align mass spectra with candidate molecular scaffolds.
-> In the second stage, starting from the retrieved scaffold, we employ the MS/MS
-> spectrum to guide an attention-based generative model to generate the final
-> molecule. Our approach constrains the molecular generation search space,
-> reducing its complexity and improving generation accuracy. We evaluate MADGEN
-> on three datasets (NIST23, CANOPUS, and MassSpecGym) and evaluate MADGEN's
-> performance with a predictive scaffold retriever and with an oracle retriever.
-> We demonstrate the effectiveness of using attention to integrate spectral
-> information throughout the generation process to achieve strong results with
-> the oracle retriever.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on de novo molecular structure generation using mass spectrometry data, with attention-based generative models, and does not meet the core requirement of focusing on prompt engineering specifically for Large Language Models (LLMs) or manipulating textual input prompts to improve LLM performance.
-
----
-
-## [MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of
-  Accelerators](https://arxiv.org/abs/http://arxiv.org/abs/2501.01951v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01951v2
-
-**Abstract:**
-> Graph convolutional networks (GCNs) have demonstrated superiority in
-> graph-based learning tasks. However, training GCNs on full graphs is
-> particularly challenging, due to the following two challenges: (1) the
-> associated feature tensors can easily explode the memory and block the
-> communication bandwidth of modern accelerators, and (2) the computation
-> workflow in training GCNs alternates between sparse and dense matrix
-> operations, complicating the efficient utilization of computational resources.
-> Existing solutions for scalable distributed full-graph GCN training mostly
-> adopt partition parallelism, which is unsatisfactory as they only partially
-> address the first challenge while incurring scaled-out communication volume. To
-> this end, we propose MixGCN aiming to simultaneously address both the
-> aforementioned challenges towards GCN training. To tackle the first challenge,
-> MixGCN integrates mixture of parallelism. Both theoretical and empirical
-> analysis verify its constant communication volumes and enhanced balanced
-> workload; For handling the second challenge, we consider mixture of
-> accelerators (i.e., sparse and dense accelerators) with a dedicated accelerator
-> for GCN training and a fine-grain pipeline. Extensive experiments show that
-> MixGCN achieves boosted training efficiency and scalability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on scalable training of Graph Convolutional Networks (GCNs) using parallelism and mixture of accelerators, with no mention of Large Language Models (LLMs) or prompt engineering, thus failing to meet all 'MUST' criteria.
-
----
-
-## [SmartSpatial: Enhancing the 3D Spatial Arrangement Capabilities of
-  Stable Diffusion Models and Introducing a Novel 3D Spatial Evaluation
-  Framework](https://arxiv.org/abs/http://arxiv.org/abs/2501.01998v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01998v1
-
-**Abstract:**
-> Stable Diffusion models have made remarkable strides in generating
-> photorealistic images from text prompts but often falter when tasked with
-> accurately representing complex spatial arrangements, particularly involving
-> intricate 3D relationships. To address this limitation, we introduce
-> SmartSpatial, an innovative approach that enhances the spatial arrangement
-> capabilities of Stable Diffusion models through 3D-aware conditioning and
-> attention-guided mechanisms. SmartSpatial incorporates depth information and
-> employs cross-attention control to ensure precise object placement, delivering
-> notable improvements in spatial accuracy metrics. In conjunction with
-> SmartSpatial, we present SmartSpatialEval, a comprehensive evaluation framework
-> designed to assess spatial relationships. This framework utilizes
-> vision-language models and graph-based dependency parsing for performance
-> analysis. Experimental results on the COCO and SpatialPrompts datasets show
-> that SmartSpatial significantly outperforms existing methods, setting new
-> benchmarks for spatial arrangement accuracy in image generation.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on enhancing Stable Diffusion models for 3D spatial arrangement in image generation, which falls under image generation driven by generative AI models other than Large Language Models (LLMs) for text generation, violating the 'MUST NOT' criteria."
-}
-
----
-
-## [Multi-Task Semantic Communication With Graph Attention-Based Feature
-  Correlation Extraction](https://arxiv.org/abs/http://arxiv.org/abs/2501.02006v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02006v1
-
-**Abstract:**
-> Multi-task semantic communication can serve multiple learning tasks using a
-> shared encoder model. Existing models have overlooked the intricate
-> relationships between features extracted during an encoding process of tasks.
-> This paper presents a new graph attention inter-block (GAI) module to the
-> encoder/transmitter of a multi-task semantic communication system, which
-> enriches the features for multiple tasks by embedding the intermediate outputs
-> of encoding in the features, compared to the existing techniques. The key idea
-> is that we interpret the outputs of the intermediate feature extraction blocks
-> of the encoder as the nodes of a graph to capture the correlations of the
-> intermediate features. Another important aspect is that we refine the node
-> representation using a graph attention mechanism to extract the correlations
-> and a multi-layer perceptron network to associate the node representations with
-> different tasks. Consequently, the intermediate features are weighted and
-> embedded into the features transmitted for executing multiple tasks at the
-> receiver. Experiments demonstrate that the proposed model surpasses the most
-> competitive and publicly available models by 11.4% on the CityScapes 2Task
-> dataset and outperforms the established state-of-the-art by 3.97% on the NYU V2
-> 3Task dataset, respectively, when the bandwidth ratio of the communication
-> channel (i.e., compression level for transmission over the channel) is as
-> constrained as 1 12 .
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on developing a new module for a multi-task semantic communication system, optimizing feature extraction with graph attention, and does not primarily investigate prompt engineering for Large Language Models (LLMs) or analyze the impact of textual input prompts on LLM output."
-}
-
----
-
-## [TART: Token-based Architecture Transformer for Neural Network
-  Performance Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.02007v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02007v1
-
-**Abstract:**
-> In the realm of neural architecture design, achieving high performance is
-> largely reliant on the manual expertise of researchers. Despite the emergence
-> of Neural Architecture Search (NAS) as a promising technique for automating
-> this process, current NAS methods still require human input to expand the
-> search space and cannot generate new architectures. This paper explores the
-> potential of Transformers in comprehending neural architectures and their
-> performance, with the objective of establishing the foundation for utilizing
-> Transformers to generate novel networks. We propose the Token-based
-> Architecture Transformer (TART), which predicts neural network performance
-> without the need to train candidate networks. TART attains state-of-the-art
-> performance on the DeepNets-1M dataset for performance prediction tasks without
-> edge information, indicating the potential of Transformers to aid in
-> discovering novel and high-performing neural architectures.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on utilizing Transformers for Neural Architecture Search (NAS) and predicting neural network performance, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), failing to meet the mandatory criteria.
-
----
-
-## [Cross-model Transferability among Large Language Models on the Platonic
-  Representations of Concepts](https://arxiv.org/abs/http://arxiv.org/abs/2501.02009v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02009v1
-
-**Abstract:**
-> Understanding the inner workings of Large Language Models (LLMs) is a
-> critical research frontier. Prior research has shown that a single LLM's
-> concept representations can be captured as steering vectors (SVs), enabling the
-> control of LLM behavior (e.g., towards generating harmful content). Our work
-> takes a novel approach by exploring the intricate relationships between concept
-> representations across different LLMs, drawing an intriguing parallel to
-> Plato's Allegory of the Cave. In particular, we introduce a linear
-> transformation method to bridge these representations and present three key
-> findings: 1) Concept representations across different LLMs can be effectively
-> aligned using simple linear transformations, enabling efficient cross-model
-> transfer and behavioral control via SVs. 2) This linear transformation
-> generalizes across concepts, facilitating alignment and control of SVs
-> representing different concepts across LLMs. 3) A weak-to-strong
-> transferability exists between LLM concept representations, whereby SVs
-> extracted from smaller LLMs can effectively control the behavior of larger
-> LLMs.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on understanding concept representations across different LLMs and developing a linear transformation method for alignment, rather than specifically engineering or optimizing prompts for improving LLM text generation performance through textual input manipulation."
-}
-
----
-
-## [Machine Learning-Based Differential Diagnosis of Parkinson's Disease
-  Using Kinematic Feature Extraction and Selection](https://arxiv.org/abs/http://arxiv.org/abs/2501.02014v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02014v1
-
-**Abstract:**
-> Parkinson's disease (PD), the second most common neurodegenerative disorder,
-> is characterized by dopaminergic neuron loss and the accumulation of abnormal
-> synuclein. PD presents both motor and non-motor symptoms that progressively
-> impair daily functioning. The severity of these symptoms is typically assessed
-> using the MDS-UPDRS rating scale, which is subjective and dependent on the
-> physician's experience. Additionally, PD shares symptoms with other
-> neurodegenerative diseases, such as progressive supranuclear palsy (PSP) and
-> multiple system atrophy (MSA), complicating accurate diagnosis. To address
-> these diagnostic challenges, we propose a machine learning-based system for
-> differential diagnosis of PD, PSP, MSA, and healthy controls (HC). This system
-> utilizes a kinematic feature-based hierarchical feature extraction and
-> selection approach. Initially, 18 kinematic features are extracted, including
-> two newly proposed features: Thumb-to-index vector velocity and acceleration,
-> which provide insights into motor control patterns. In addition, 41 statistical
-> features were extracted here from each kinematic feature, including some new
-> approaches such as Average Absolute Change, Rhythm, Amplitude, Frequency,
-> Standard Deviation of Frequency, and Slope. Feature selection is performed
-> using One-way ANOVA to rank features, followed by Sequential Forward Floating
-> Selection (SFFS) to identify the most relevant ones, aiming to reduce the
-> computational complexity. The final feature set is used for classification,
-> achieving a classification accuracy of 66.67% for each dataset and 88.89% for
-> each patient, with particularly high performance for the MSA and HC groups
-> using the SVM algorithm. This system shows potential as a rapid and accurate
-> diagnostic tool in clinical practice, though further data collection and
-> refinement are needed to enhance its reliability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on machine learning for differential diagnosis of Parkinson's disease using kinematic feature extraction and selection, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, failing to meet all 'MUST' criteria.
-
----
-
-## [Enhancing Uncertainty Modeling with Semantic Graph for Hallucination
-  Detection](https://arxiv.org/abs/http://arxiv.org/abs/2501.02020v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02020v1
-
-**Abstract:**
-> Large Language Models (LLMs) are prone to hallucination with non-factual or
-> unfaithful statements, which undermines the applications in real-world
-> scenarios. Recent researches focus on uncertainty-based hallucination
-> detection, which utilizes the output probability of LLMs for uncertainty
-> calculation and does not rely on external knowledge or frequent sampling from
-> LLMs. Whereas, most approaches merely consider the uncertainty of each
-> independent token, while the intricate semantic relations among tokens and
-> sentences are not well studied, which limits the detection of hallucination
-> that spans over multiple tokens and sentences in the passage. In this paper, we
-> propose a method to enhance uncertainty modeling with semantic graph for
-> hallucination detection. Specifically, we first construct a semantic graph that
-> well captures the relations among entity tokens and sentences. Then, we
-> incorporate the relations between two entities for uncertainty propagation to
-> enhance sentence-level hallucination detection. Given that hallucination occurs
-> due to the conflict between sentences, we further present a graph-based
-> uncertainty calibration method that integrates the contradiction probability of
-> the sentence with its neighbors in the semantic graph for uncertainty
-> calculation. Extensive experiments on two datasets show the great advantages of
-> our proposed approach. In particular, we obtain substantial improvements with
-> 19.78% in passage-level hallucination detection.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing uncertainty modeling for hallucination detection in LLMs, rather than prompt engineering, optimization, or manipulation of textual input prompts to improve LLM performance, thus not meeting the core subject requirement.
-
----
-
-## [Weakly Supervised Learning on Large Graphs](https://arxiv.org/abs/http://arxiv.org/abs/2501.02021v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02021v1
-
-**Abstract:**
-> Graph classification plays a pivotal role in various domains, including
-> pathology, where images can be represented as graphs.In this domain, images can
-> be represented as graphs, where nodes might represent individual nuclei, and
-> edges capture the spatial or functional relationships between them. Often, the
-> overall label of the graph, such as a cancer type or disease state, is
-> determined by patterns within smaller, localized regions of the image. This
-> work introduces a weakly-supervised graph classification framework leveraging
-> two subgraph extraction techniques: (1) Sliding-window approach (2) BFS-based
-> approach. Subgraphs are processed using a Graph Attention Network (GAT), which
-> employs attention mechanisms to identify the most informative subgraphs for
-> classification. Weak supervision is achieved by propagating graph-level labels
-> to subgraphs, eliminating the need for detailed subgraph annotations.
-
-**Decision Explanation:**
-Original decision: REJECT
-This paper focuses on weakly supervised learning for graph classification using Graph Attention Networks, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [CarbonChat: Large Language Model-Based Corporate Carbon Emission
-  Analysis and Climate Knowledge Q&A System](https://arxiv.org/abs/http://arxiv.org/abs/2501.02031v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02031v1
-
-**Abstract:**
-> As the impact of global climate change intensifies, corporate carbon
-> emissions have become a focal point of global attention. In response to issues
-> such as the lag in climate change knowledge updates within large language
-> models, the lack of specialization and accuracy in traditional augmented
-> generation architectures for complex problems, and the high cost and time
-> consumption of sustainability report analysis, this paper proposes CarbonChat:
-> Large Language Model-based corporate carbon emission analysis and climate
-> knowledge Q&A system, aimed at achieving precise carbon emission analysis and
-> policy understanding.First, a diversified index module construction method is
-> proposed to handle the segmentation of rule-based and long-text documents, as
-> well as the extraction of structured data, thereby optimizing the parsing of
-> key information.Second, an enhanced self-prompt retrieval-augmented generation
-> architecture is designed, integrating intent recognition, structured reasoning
-> chains, hybrid retrieval, and Text2SQL, improving the efficiency of semantic
-> understanding and query conversion.Next, based on the greenhouse gas accounting
-> framework, 14 dimensions are established for carbon emission analysis, enabling
-> report summarization, relevance evaluation, and customized responses.Finally,
-> through a multi-layer chunking mechanism, timestamps, and hallucination
-> detection features, the accuracy and verifiability of the analysis results are
-> ensured, reducing hallucination rates and enhancing the precision of the
-> responses.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a corporate carbon emission analysis and climate knowledge Q&A system using LLMs, rather than specifically on prompt engineering for LLMs. While it mentions a 'self-prompt retrieval-augmented generation architecture', prompt engineering is not the central concern, but rather a component of the larger system.
-
----
-
-## [3D Cloud reconstruction through geospatially-aware Masked Autoencoders](https://arxiv.org/abs/http://arxiv.org/abs/2501.02035v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02035v1
-
-**Abstract:**
-> Clouds play a key role in Earth's radiation balance with complex effects that
-> introduce large uncertainties into climate models. Real-time 3D cloud data is
-> essential for improving climate predictions. This study leverages geostationary
-> imagery from MSG/SEVIRI and radar reflectivity measurements of cloud profiles
-> from CloudSat/CPR to reconstruct 3D cloud structures. We first apply
-> self-supervised learning (SSL) methods-Masked Autoencoders (MAE) and
-> geospatially-aware SatMAE on unlabelled MSG images, and then fine-tune our
-> models on matched image-profile pairs. Our approach outperforms
-> state-of-the-art methods like U-Nets, and our geospatial encoding further
-> improves prediction results, demonstrating the potential of SSL for cloud
-> reconstruction.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on 3D cloud reconstruction using geospatially-aware Masked Autoencoders, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria."
-}
-
----
-
-## [Architecture for Trajectory-Based Fishing Ship Classification with AIS
-  Data](https://arxiv.org/abs/http://arxiv.org/abs/2501.02038v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02038v1
-
-**Abstract:**
-> This paper proposes a data preparation process for managing real-world
-> kinematic data and detecting fishing vessels. The solution is a binary
-> classification that classifies ship trajectories into either fishing or
-> non-fishing ships. The data used are characterized by the typical problems
-> found in classic data mining applications using real-world data, such as noise
-> and inconsistencies. The two classes are also clearly unbalanced in the data, a
-> problem which is addressed using algorithms that resample the instances. For
-> classification, a series of features are extracted from spatiotemporal data
-> that represent the trajectories of the ships, available from sequences of
-> Automatic Identification System (AIS) reports. These features are proposed for
-> the modelling of ship behavior but, because they do not contain context-related
-> information, the classification can be applied in other scenarios.
-> Experimentation shows that the proposed data preparation process is useful for
-> the presented classification problem. In addition, positive results are
-> obtained using minimal information.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing an architecture for ship classification using AIS data and kinematic data, with no mention of Large Language Models (LLMs), prompt engineering, or text-based interactions, failing to meet the primary 'MUST' criteria.
-
----
-
-## [An Investigation into Value Misalignment in LLM-Generated Texts for
-  Cultural Heritage](https://arxiv.org/abs/http://arxiv.org/abs/2501.02039v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02039v1
-
-**Abstract:**
-> As Large Language Models (LLMs) become increasingly prevalent in tasks
-> related to cultural heritage, such as generating descriptions of historical
-> monuments, translating ancient texts, preserving oral traditions, and creating
-> educational content, their ability to produce accurate and culturally aligned
-> texts is being increasingly relied upon by users and researchers. However,
-> cultural value misalignments may exist in generated texts, such as the
-> misrepresentation of historical facts, the erosion of cultural identity, and
-> the oversimplification of complex cultural narratives, which may lead to severe
-> consequences. Therefore, investigating value misalignment in the context of LLM
-> for cultural heritage is crucial for mitigating these risks, yet there has been
-> a significant lack of systematic and comprehensive study and investigation in
-> this area. To fill this gap, we systematically assess the reliability of LLMs
-> in generating culturally aligned texts for cultural heritage-related tasks. We
-> conduct a comprehensive evaluation by compiling an extensive set of 1066 query
-> tasks covering 5 widely recognized categories with 17 aspects within the
-> knowledge framework of cultural heritage across 5 open-source LLMs, and examine
-> both the type and rate of cultural value misalignments in the generated texts.
-> Using both automated and manual approaches, we effectively detect and analyze
-> the cultural value misalignments in LLM-generated texts. Our findings are
-> concerning: over 65% of the generated texts exhibit notable cultural
-> misalignments, with certain tasks demonstrating almost complete misalignment
-> with key cultural values. Beyond these findings, this paper introduces a
-> benchmark dataset and a comprehensive evaluation workflow that can serve as a
-> valuable resource for future research aimed at enhancing the cultural
-> sensitivity and reliability of LLMs.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on evaluating cultural value misalignment in LLM-generated texts for cultural heritage, rather than investigating methods for improving LLM performance through the manipulation of textual input prompts, and does not provide concrete examples of prompt engineering techniques.
-
----
-
-## [A Separable Self-attention Inspired by the State Space Model for
-  Computer Vision](https://arxiv.org/abs/http://arxiv.org/abs/2501.02040v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02040v1
-
-**Abstract:**
-> Mamba is an efficient State Space Model (SSM) with linear computational
-> complexity. Although SSMs are not suitable for handling non-causal data, Vision
-> Mamba (ViM) methods still demonstrate good performance in tasks such as image
-> classification and object detection. Recent studies have shown that there is a
-> rich theoretical connection between state space models and attention variants.
-> We propose a novel separable self attention method, for the first time
-> introducing some excellent design concepts of Mamba into separable
-> self-attention. To ensure a fair comparison with ViMs, we introduce VMINet, a
-> simple yet powerful prototype architecture, constructed solely by stacking our
-> novel attention modules with the most basic down-sampling layers. Notably,
-> VMINet differs significantly from the conventional Transformer architecture.
-> Our experiments demonstrate that VMINet has achieved competitive results on
-> image classification and high-resolution dense prediction tasks.Code is
-> available at: \url{https://github.com/yws-wxs/VMINet}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on computer vision, proposing a novel separable self-attention method and its application in image classification and object detection, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts.
-
----
-
-## [MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using
-  Multi-Instance Point Cloud Registration](https://arxiv.org/abs/http://arxiv.org/abs/2501.02041v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02041v1
-
-**Abstract:**
-> A high-fidelity digital simulation environment is crucial for accurately
-> replicating physical operational processes. However, inconsistencies between
-> simulation and physical environments result in low confidence in simulation
-> outcomes, limiting their effectiveness in guiding real-world production. Unlike
-> the traditional step-by-step point cloud "segmentation-registration" generation
-> method, this paper introduces, for the first time, a novel Multi-Robot
-> Manufacturing Digital Scene Generation (MRG) method that leverages
-> multi-instance point cloud registration, specifically within manufacturing
-> scenes. Tailored to the characteristics of industrial robots and manufacturing
-> settings, an instance-focused transformer module is developed to delineate
-> instance boundaries and capture correlations between local regions.
-> Additionally, a hypothesis generation module is proposed to extract target
-> instances while preserving key features. Finally, an efficient screening and
-> optimization algorithm is designed to refine the final registration results.
-> Experimental evaluations on the Scan2CAD and Welding-Station datasets
-> demonstrate that: (1) the proposed method outperforms existing multi-instance
-> point cloud registration techniques; (2) compared to state-of-the-art methods,
-> the Scan2CAD dataset achieves improvements in MR and MP by 12.15% and 17.79%,
-> respectively; and (3) on the Welding-Station dataset, MR and MP are enhanced by
-> 16.95% and 24.15%, respectively. This work marks the first application of
-> multi-instance point cloud registration in manufacturing scenes, significantly
-> advancing the precision and reliability of digital simulation environments for
-> industrial applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multi-instance point cloud registration for digital scene generation in manufacturing, with no indication of involving Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [Advancing Pancreatic Cancer Prediction with a Next Visit Token
-  Prediction Head on top of Med-BERT](https://arxiv.org/abs/http://arxiv.org/abs/2501.02044v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02044v1
-
-**Abstract:**
-> Background: Recently, numerous foundation models pretrained on extensive data
-> have demonstrated efficacy in disease prediction using Electronic Health
-> Records (EHRs). However, there remains some unanswered questions on how to best
-> utilize such models especially with very small fine-tuning cohorts. Methods: We
-> utilized Med-BERT, an EHR-specific foundation model, and reformulated the
-> disease binary prediction task into a token prediction task and a next visit
-> mask token prediction task to align with Med-BERT's pretraining task format in
-> order to improve the accuracy of pancreatic cancer (PaCa) prediction in both
-> few-shot and fully supervised settings. Results: The reformulation of the task
-> into a token prediction task, referred to as Med-BERT-Sum, demonstrates
-> slightly superior performance in both few-shot scenarios and larger data
-> samples. Furthermore, reformulating the prediction task as a Next Visit Mask
-> Token Prediction task (Med-BERT-Mask) significantly outperforms the
-> conventional Binary Classification (BC) prediction task (Med-BERT-BC) by 3% to
-> 7% in few-shot scenarios with data sizes ranging from 10 to 500 samples. These
-> findings highlight that aligning the downstream task with Med-BERT's
-> pretraining objectives substantially enhances the model's predictive
-> capabilities, thereby improving its effectiveness in predicting both rare and
-> common diseases. Conclusion: Reformatting disease prediction tasks to align
-> with the pretraining of foundation models enhances prediction accuracy, leading
-> to earlier detection and timely intervention. This approach improves treatment
-> effectiveness, survival rates, and overall patient outcomes for PaCa and
-> potentially other cancers.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on optimizing disease prediction tasks using Med-BERT, a foundation model, by reformulating the prediction task, rather than specifically engineering or optimizing prompts for Large Language Models (LLMs). The core subject is not prompt engineering for text-based interactions with LLMs, but rather adapting task formats to improve model performance in a medical context, which is explicitly excluded.
-
----
-
-## [ArtCrafter: Text-Image Aligning Style Transfer via Embedding Reframing](https://arxiv.org/abs/http://arxiv.org/abs/2501.02064v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02064v1
-
-**Abstract:**
-> Recent years have witnessed significant advancements in text-guided style
-> transfer, primarily attributed to innovations in diffusion models. These models
-> excel in conditional guidance, utilizing text or images to direct the sampling
-> process. However, despite their capabilities, direct conditional guidance
-> approaches often face challenges in balancing the expressiveness of textual
-> semantics with the diversity of output results while capturing stylistic
-> features. To address these challenges, we introduce ArtCrafter, a novel
-> framework for text-to-image style transfer. Specifically, we introduce an
-> attention-based style extraction module, meticulously engineered to capture the
-> subtle stylistic elements within an image. This module features a multi-layer
-> architecture that leverages the capabilities of perceiver attention mechanisms
-> to integrate fine-grained information. Additionally, we present a novel
-> text-image aligning augmentation component that adeptly balances control over
-> both modalities, enabling the model to efficiently map image and text
-> embeddings into a shared feature space. We achieve this through attention
-> operations that enable smooth information flow between modalities. Lastly, we
-> incorporate an explicit modulation that seamlessly blends multimodal enhanced
-> embeddings with original embeddings through an embedding reframing design,
-> empowering the model to generate diverse outputs. Extensive experiments
-> demonstrate that ArtCrafter yields impressive results in visual stylization,
-> exhibiting exceptional levels of stylistic intensity, controllability, and
-> diversity.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on text-to-image style transfer using diffusion models, which falls under image generation driven by generative AI models other than Large Language Models (LLMs) for text generation, violating the 'Papers MUST NOT' criterion 2.
-
----
-
-## [The interplay between domain specialization and model size: a case study
-  in the legal domain](https://arxiv.org/abs/http://arxiv.org/abs/2501.02068v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02068v1
-
-**Abstract:**
-> Scaling laws for language models so far focused on finding the
-> compute-optimal model size and token count for training from scratch. However,
-> achieving this optimal balance requires significant compute resources due to
-> the extensive data demands when training models from randomly-initialized
-> weights. Continual pre-training offers a cost-effective alternative, leveraging
-> the compute investment from pre-trained models to incorporate new knowledge
-> without requiring extensive new data. Recent findings suggest that data quality
-> influences constants in scaling laws, thereby altering the optimal
-> parameter-token allocation ratio. Building on this insight, we investigate the
-> interplay between domain specialization and model size during continual
-> pre-training under compute-constrained scenarios. Our goal is to identify a
-> compute-efficient training regime for this scenario and, potentially, detect
-> patterns in this interplay that can be generalized across different model sizes
-> and domains. To compare general and specialized training, we filtered a
-> web-based dataset to extract legal domain data. We pre-trained models with
-> 1.5B, 3B, 7B and 14B parameters on both the unfiltered and filtered datasets,
-> then evaluated their performance on legal exams. Results show that as model
-> size increases, the compute-effectiveness gap between specialized and general
-> models widens.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on the interplay between domain specialization and model size for continual pre-training, which falls under developing new training methods for LLMs, rather than prompt engineering for text-based interactions with LLMs.
-
----
-
-## [On the Statistical Complexity for Offline and Low-Adaptive Reinforcement
-  Learning with Structures](https://arxiv.org/abs/http://arxiv.org/abs/2501.02089v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02089v1
-
-**Abstract:**
-> This article reviews the recent advances on the statistical foundation of
-> reinforcement learning (RL) in the offline and low-adaptive settings. We will
-> start by arguing why offline RL is the appropriate model for almost any
-> real-life ML problems, even if they have nothing to do with the recent AI
-> breakthroughs that use RL. Then we will zoom into two fundamental problems of
-> offline RL: offline policy evaluation (OPE) and offline policy learning (OPL).
-> It may be surprising to people that tight bounds for these problems were not
-> known even for tabular and linear cases until recently. We delineate the
-> differences between worst-case minimax bounds and instance-dependent bounds. We
-> also cover key algorithmic ideas and proof techniques behind near-optimal
-> instance-dependent methods in OPE and OPL. Finally, we discuss the limitations
-> of offline RL and review a burgeoning problem of \emph{low-adaptive
-> exploration} which addresses these limitations by providing a sweet middle
-> ground between offline and online RL.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary focus criteria, as it discusses reinforcement learning (RL) in offline and low-adaptive settings, with no evident focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Online Detection of Water Contamination Under Concept Drift](https://arxiv.org/abs/http://arxiv.org/abs/2501.02107v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02107v1
-
-**Abstract:**
-> Water Distribution Networks (WDNs) are vital infrastructures, and
-> contamination poses serious public health risks. Harmful substances can
-> interact with disinfectants like chlorine, making chlorine monitoring essential
-> for detecting contaminants. However, chlorine sensors often become unreliable
-> and require frequent calibration. This study introduces the Dual-Threshold
-> Anomaly and Drift Detection (AD&DD) method, an unsupervised approach combining
-> a dual-threshold drift detection mechanism with an LSTM-based Variational
-> Autoencoder(LSTM-VAE) for real-time contamination detection. Tested on two
-> realistic WDNs, AD&DD effectively identifies anomalies with sensor offsets as
-> concept drift, and outperforms other methods. A proposed decentralized
-> architecture enables accurate contamination detection and localization by
-> deploying AD&DD on selected nodes.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the 'MUST' criteria as it focuses on detecting water contamination using an LSTM-based Variational Autoencoder, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [Siamese Networks for Cat Re-Identification: Exploring Neural Models for
-  Cat Instance Recognition](https://arxiv.org/abs/http://arxiv.org/abs/2501.02112v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02112v1
-
-**Abstract:**
-> Street cats in urban areas often rely on human intervention for survival,
-> leading to challenges in population control and welfare management. In April
-> 2023, Hello Inc., a Chinese urban mobility company, launched the Hello Street
-> Cat initiative to address these issues. The project deployed over 21,000 smart
-> feeding stations across 14 cities in China, integrating livestreaming cameras
-> and treat dispensers activated through user donations. It also promotes the
-> Trap-Neuter-Return (TNR) method, supported by a community-driven platform,
-> HelloStreetCatWiki, where volunteers catalog and identify cats. However, manual
-> identification is inefficient and unsustainable, creating a need for automated
-> solutions. This study explores Deep Learning-based models for re-identifying
-> street cats in the Hello Street Cat initiative. A dataset of 2,796 images of 69
-> cats was used to train Siamese Networks with EfficientNetB0, MobileNet and
-> VGG16 as base models, evaluated under contrastive and triplet loss functions.
-> VGG16 paired with contrastive loss emerged as the most effective configuration,
-> achieving up to 97% accuracy and an F1 score of 0.9344 during testing. The
-> approach leverages image augmentation and dataset refinement to overcome
-> challenges posed by limited data and diverse visual variations. These findings
-> underscore the potential of automated cat re-identification to streamline
-> population monitoring and welfare efforts. By reducing reliance on manual
-> processes, the method offers a scalable and reliable solution for
-> communitydriven initiatives. Future research will focus on expanding datasets
-> and developing real-time implementations to enhance practicality in large-scale
-> deployments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on image recognition for cat re-identification using Siamese Networks and Deep Learning models, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [AVTrustBench: Assessing and Enhancing Reliability and Robustness in
-  Audio-Visual LLMs](https://arxiv.org/abs/http://arxiv.org/abs/2501.02135v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02135v1
-
-**Abstract:**
-> With the rapid advancement of Multi-modal Large Language Models (MLLMs),
-> several diagnostic benchmarks have recently been developed to assess these
-> models' multi-modal reasoning proficiency. However, these benchmarks are
-> restricted to assessing primarily the visual aspect and do not examine the
-> holistic audio-visual (AV) understanding. Moreover, currently, there are no
-> benchmarks that investigate the capabilities of AVLLMs to calibrate their
-> responses when presented with perturbed inputs. To this end, we introduce
-> Audio-Visual Trustworthiness assessment Benchmark (AVTrustBench), comprising
-> 600K samples spanning over 9 meticulously crafted tasks, evaluating the
-> capabilities of AVLLMs across three distinct dimensions: Adversarial attack,
-> Compositional reasoning, and Modality-specific dependency. Using our benchmark
-> we extensively evaluate 13 state-of-the-art AVLLMs. The findings reveal that
-> the majority of existing models fall significantly short of achieving
-> human-like comprehension, offering valuable insights for future research
-> directions. To alleviate the limitations in the existing approaches, we further
-> propose a robust, model-agnostic calibrated audio-visual preference
-> optimization based training strategy CAVPref, obtaining a gain up to 30.19%
-> across all 9 tasks. We will publicly release our code and benchmark to
-> facilitate future research in this direction.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on the development and evaluation of a benchmark for Audio-Visual Large Language Models (AVLLMs) and proposes a training strategy (CAVPref) to enhance their reliability and robustness, rather than focusing on prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-## [Attribute-Based Robotic Grasping with Data-Efficient Adaptation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02149v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02149v1
-
-**Abstract:**
-> Robotic grasping is one of the most fundamental robotic manipulation tasks
-> and has been the subject of extensive research. However, swiftly teaching a
-> robot to grasp a novel target object in clutter remains challenging. This paper
-> attempts to address the challenge by leveraging object attributes that
-> facilitate recognition, grasping, and rapid adaptation to new domains. In this
-> work, we present an end-to-end encoder-decoder network to learn attribute-based
-> robotic grasping with data-efficient adaptation capability. We first pre-train
-> the end-to-end model with a variety of basic objects to learn generic attribute
-> representation for recognition and grasping. Our approach fuses the embeddings
-> of a workspace image and a query text using a gated-attention mechanism and
-> learns to predict instance grasping affordances. To train the joint embedding
-> space of visual and textual attributes, the robot utilizes object persistence
-> before and after grasping. Our model is self-supervised in a simulation that
-> only uses basic objects of various colors and shapes but generalizes to novel
-> objects in new environments. To further facilitate generalization, we propose
-> two adaptation methods, adversarial adaption and one-grasp adaptation.
-> Adversarial adaptation regulates the image encoder using augmented data of
-> unlabeled images, whereas one-grasp adaptation updates the overall end-to-end
-> model using augmented data from one grasp trial. Both adaptation methods are
-> data-efficient and considerably improve instance grasping performance.
-> Experimental results in both simulation and the real world demonstrate that our
-> approach achieves over 81% instance grasping success rate on unknown objects,
-> which outperforms several baselines by large margins.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on robotic grasping and adaptation in novel environments, utilizing an encoder-decoder network, with no emphasis on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [The Integration of Blockchain and Artificial Intelligence for Secure
-  Healthcare Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.02169v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02169v1
-
-**Abstract:**
-> Verisign reported a 125 percent increase in data breaches within the
-> healthcare sector in the United States during 2022, with 18.2 million patient
-> records being impacted. Growing healthcare data volumes and diversification
-> mean that medical information is becoming more valuable. Many Health Centers
-> use various technologies to ease the classification, storage, and exchange of
-> big data. This use can also make the health data of the users at risk and
-> vulnerable. AI and blockchain are among the leading technologies at hand. With
-> AI, data-driven operations and big data efficiency have been improved with
-> respect to traditional techniques. Due to its potential to bring about
-> improvements in health services and lower medical costs, this AI technology is
-> regularly used in healthcare. Blockchain helps protect transactions on sharing
-> information and private privacy as long as the exchange of knowledge is that of
-> the standard. The objective of this analysis is to investigate the research and
-> unique contributions since 2008 regarding blockchain-integrated AI and
-> healthcare systems. The work sheds light on applied AI-based healthcare schemes
-> with machine, ballistic, and acrylic learning and disparate blockchain
-> structures. The use of technology in order to ensure patient data security and
-> manage medical information effectively in healthcare settings offers a highly
-> successful position for both healthcare providers and patients. From 2018 to
-> 2021, the best year was 2021 to grow, enhancing everything to examine the
-> download of the device and the counting of Google Academies, for which the
-> joining perspective was borrowed; local research experts were asked, identified
-> articles in recent years, and read reviews of large research grants.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the integration of blockchain and AI for secure healthcare systems, not on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not provide examples of prompts impacting LLM output.
-
----
-
-## [AdaMixup: A Dynamic Defense Framework for Membership Inference Attack
-  Mitigation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02182v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02182v1
-
-**Abstract:**
-> Membership inference attacks have emerged as a significant privacy concern in
-> the training of deep learning models, where attackers can infer whether a data
-> point was part of the training set based on the model's outputs. To address
-> this challenge, we propose a novel defense mechanism, AdaMixup. AdaMixup
-> employs adaptive mixup techniques to enhance the model's robustness against
-> membership inference attacks by dynamically adjusting the mixup strategy during
-> training. This method not only improves the model's privacy protection but also
-> maintains high performance. Experimental results across multiple datasets
-> demonstrate that AdaMixup significantly reduces the risk of membership
-> inference attacks while achieving a favorable trade-off between defensive
-> efficiency and model accuracy. This research provides an effective solution for
-> data privacy protection and lays the groundwork for future advancements in
-> mixup training methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a defense mechanism (AdaMixup) for mitigating membership inference attacks in deep learning models, with no apparent emphasis on prompt engineering, Large Language Models (LLMs), or the manipulation of textual input prompts for LLM performance.
-
----
-
-## [CPTuning: Contrastive Prompt Tuning for Generative Relation Extraction](https://arxiv.org/abs/http://arxiv.org/abs/2501.02196v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02196v1
-
-**Abstract:**
-> Generative relation extraction (RE) commonly involves first reformulating RE
-> as a linguistic modeling problem easily tackled with pre-trained language
-> models (PLM) and then fine-tuning a PLM with supervised cross-entropy loss.
-> Although having achieved promising performance, existing approaches assume only
-> one deterministic relation between each pair of entities without considering
-> real scenarios where multiple relations may be valid, i.e., entity pair
-> overlap, causing their limited applications. To address this problem, we
-> introduce a novel contrastive prompt tuning method for RE, CPTuning, which
-> learns to associate a candidate relation between two in-context entities with a
-> probability mass above or below a threshold, corresponding to whether the
-> relation exists. Beyond learning schema, CPTuning also organizes RE as a
-> verbalized relation generation task and uses Trie-constrained decoding to
-> ensure a model generates valid relations. It adaptively picks out the generated
-> candidate relations with a high estimated likelihood in inference, thereby
-> achieving multi-relation extraction. We conduct extensive experiments on four
-> widely used datasets to validate our method. Results show that T5-large
-> fine-tuned with CPTuning significantly outperforms previous methods, regardless
-> of single or multiple relations extraction.
-
-**Decision Explanation:**
-Original decision: REJECT
-Although the paper mentions prompt tuning for Large Language Models (LLMs), its primary focus is on improving Generative Relation Extraction through a novel training method, rather than solely on prompt engineering for text-based interactions with LLMs, as evidenced by the emphasis on fine-tuning a PLM and the development of a new decoding strategy.
-
----
-
-## [Financial Named Entity Recognition: How Far Can LLM Go?](https://arxiv.org/abs/http://arxiv.org/abs/2501.02237v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02237v1
-
-**Abstract:**
-> The surge of large language models (LLMs) has revolutionized the extraction
-> and analysis of crucial information from a growing volume of financial
-> statements, announcements, and business news. Recognition for named entities to
-> construct structured data poses a significant challenge in analyzing financial
-> documents and is a foundational task for intelligent financial analytics.
-> However, how effective are these generic LLMs and their performance under
-> various prompts are yet need a better understanding. To fill in the blank, we
-> present a systematic evaluation of state-of-the-art LLMs and prompting methods
-> in the financial Named Entity Recognition (NER) problem. Specifically, our
-> experimental results highlight their strengths and limitations, identify five
-> representative failure types, and provide insights into their potential and
-> challenges for domain-specific tasks.
-
-**Decision Explanation:**
-Original decision: REJECT
-Although the paper investigates LLM performance under various prompts, its primary focus is on evaluating LLMs for a specific application (Financial Named Entity Recognition) rather than on the engineering, design, or optimization of prompts themselves for improving LLM text generation capabilities.
-
----
-
-## [Interpretable Load Forecasting via Representation Learning of
-  Geo-distributed Meteorological Factors](https://arxiv.org/abs/http://arxiv.org/abs/2501.02241v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02241v1
-
-**Abstract:**
-> Meteorological factors (MF) are crucial in day-ahead load forecasting as they
-> significantly influence the electricity consumption behaviors of consumers.
-> Numerous studies have incorporated MF into the load forecasting model to
-> achieve higher accuracy. Selecting MF from one representative location or the
-> averaged MF as the inputs of the forecasting model is a common practice.
-> However, the difference in MF collected in various locations within a region
-> may be significant, which poses a challenge in selecting the appropriate MF
-> from numerous locations. A representation learning framework is proposed to
-> extract geo-distributed MF while considering their spatial relationships. In
-> addition, this paper employs the Shapley value in the graph-based model to
-> reveal connections between MF collected in different locations and loads. To
-> reduce the computational complexity of calculating the Shapley value, an
-> acceleration method is adopted based on Monte Carlo sampling and weighted
-> linear regression. Experiments on two real-world datasets demonstrate that the
-> proposed method improves the day-ahead forecasting accuracy, especially in
-> extreme scenarios such as the "accumulation temperature effect" in summer and
-> "sudden temperature change" in winter. We also find a significant correlation
-> between the importance of MF in different locations and the corresponding
-> area's GDP and mainstay industry.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on interpretable load forecasting using representation learning of geo-distributed meteorological factors, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria."
-}
-
----
-
-## [LLMzSzŁ: a comprehensive LLM benchmark for Polish](https://arxiv.org/abs/http://arxiv.org/abs/2501.02266v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02266v1
-
-**Abstract:**
-> This article introduces the first comprehensive benchmark for the Polish
-> language at this scale: LLMzSz{\L} (LLMs Behind the School Desk). It is based
-> on a coherent collection of Polish national exams, including both academic and
-> professional tests extracted from the archives of the Polish Central
-> Examination Board. It covers 4 types of exams, coming from 154 domains.
-> Altogether, it consists of almost 19k closed-ended questions. We investigate
-> the performance of open-source multilingual, English, and Polish LLMs to verify
-> LLMs' abilities to transfer knowledge between languages. Also, the correlation
-> between LLMs and humans at model accuracy and exam pass rate levels is
-> examined. We show that multilingual LLMs can obtain superior results over
-> monolingual ones; however, monolingual models may be beneficial when model size
-> matters. Our analysis highlights the potential of LLMs in assisting with exam
-> validation, particularly in identifying anomalies or errors in examination
-> tasks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on introducing a benchmark for evaluating LLM performance on Polish language tasks, examining knowledge transfer and correlation with human accuracy, rather than primarily on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [What Kind of Visual Tokens Do We Need? Training-free Visual Token
-  Pruning for Multi-modal Large Language Models from the Perspective of Graph](https://arxiv.org/abs/http://arxiv.org/abs/2501.02268v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02268v1
-
-**Abstract:**
-> Recent Multimodal Large Language Models(MLLMs) often use a large number of
-> visual tokens to compensate their visual shortcoming, leading to excessive
-> computation and obvious visual redundancy. In this paper, we investigate what
-> kind of visual tokens are needed for MLLMs, and reveal that both foreground and
-> background tokens are critical for MLLMs given the varying difficulties of
-> examples. Based on this observation, we propose a graph-based method towards
-> training-free visual token pruning, termed G-Prune.In particular, G-Prune
-> regards visual tokens as nodes, and construct their connections based on their
-> semantic similarities. Afterwards, the information flow is propagated via
-> weighted links, and the most important tokens after iterations are kept for
-> MLLMs, which can be front or background.To validate G-Prune, we apply it to a
-> recent MLLM called LLaVA-NeXT, and conduct extensive experiments on a set of
-> benchmarks.The experiment results show that G-Prune can greatly reduce
-> computation overhead while retaining high performance on both coarse- and
-> fine-grained tasks. For instance, G-Prune can reduce 63.57\% FLOPs of
-> LLaVA-NeXT on VQA2.0 and TextVQA with only 0.95\% and 2.34\% accuracy drops,
-> respectively.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on optimizing visual token pruning for Multi-modal Large Language Models (MLLMs), which is more related to model optimization and reducing computational overhead rather than prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-## [Hyperbolic Contrastive Learning for Hierarchical 3D Point Cloud
-  Embedding](https://arxiv.org/abs/http://arxiv.org/abs/2501.02285v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02285v2
-
-**Abstract:**
-> Hyperbolic spaces allow for more efficient modeling of complex, hierarchical
-> structures, which is particularly beneficial in tasks involving multi-modal
-> data. Although hyperbolic geometries have been proven effective for
-> language-image pre-training, their capabilities to unify language, image, and
-> 3D Point Cloud modalities are under-explored. We extend the 3D Point Cloud
-> modality in hyperbolic multi-modal contrastive pre-training. Additionally, we
-> explore the entailment, modality gap, and alignment regularizers for learning
-> hierarchical 3D embeddings and facilitating the transfer of knowledge from both
-> Text and Image modalities. These regularizers enable the learning of
-> intra-modal hierarchy within each modality and inter-modal hierarchy across
-> text, 2D images, and 3D Point Clouds. Experimental results demonstrate that our
-> proposed training strategy yields an outstanding 3D Point Cloud encoder, and
-> the obtained 3D Point Cloud hierarchical embeddings significantly improve
-> performance on various downstream tasks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multi-modal contrastive pre-training for 3D Point Cloud embedding, primarily involving image and 3D modalities, with no clear emphasis on prompt engineering or text generation driven by Large Language Models (LLMs).
-
----
-
-## [SR-Reward: Taking The Path More Traveled](https://arxiv.org/abs/http://arxiv.org/abs/2501.02330v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02330v1
-
-**Abstract:**
-> In this paper, we propose a novel method for learning reward functions
-> directly from offline demonstrations. Unlike traditional inverse reinforcement
-> learning (IRL), our approach decouples the reward function from the learner's
-> policy, eliminating the adversarial interaction typically required between the
-> two. This results in a more stable and efficient training process. Our reward
-> function, called \textit{SR-Reward}, leverages successor representation (SR) to
-> encode a state based on expected future states' visitation under the
-> demonstration policy and transition dynamics. By utilizing the Bellman
-> equation, SR-Reward can be learned concurrently with most reinforcement
-> learning (RL) algorithms without altering the existing training pipeline. We
-> also introduce a negative sampling strategy to mitigate overestimation errors
-> by reducing rewards for out-of-distribution data, thereby enhancing robustness.
-> This strategy inherently introduces a conservative bias into RL algorithms that
-> employ the learned reward. We evaluate our method on the D4RL benchmark,
-> achieving competitive results compared to offline RL algorithms with access to
-> true rewards and imitation learning (IL) techniques like behavioral cloning.
-> Moreover, our ablation studies on data size and quality reveal the advantages
-> and limitations of SR-Reward as a proxy for true rewards.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on learning reward functions for reinforcement learning (RL) with offline demonstrations, using successor representation and Bellman equation, and does not meet any of the 'MUST' criteria related to prompt engineering for Large Language Models (LLMs), such as improving LLM performance through textual input prompt manipulation or providing concrete examples of LLM prompts.
-
----
-
-## [AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM
-  Inference](https://arxiv.org/abs/http://arxiv.org/abs/2501.02336v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02336v1
-
-**Abstract:**
-> Long-context large language models (LLMs) inference is increasingly critical,
-> motivating a number of studies devoted to alleviating the substantial storage
-> and computational costs in such scenarios. Layer-wise skipping methods are
-> promising optimizations but rarely explored in long-context inference. We
-> observe that existing layer-wise skipping strategies have several limitations
-> when applied in long-context inference, including the inability to adapt to
-> model and context variability, disregard for sublayer significance, and
-> inapplicability for the prefilling phase. This paper proposes \sysname, an
-> adaptive sublayer skipping method specifically designed for long-context
-> inference. \sysname adaptively identifies less important layers by leveraging
-> on-the-fly similarity information, enables sublayer-wise skipping, and
-> accelerates both the prefilling and decoding phases. The effectiveness of
-> \sysname is demonstrated through extensive experiments on various long-context
-> benchmarks and models, showcasing its superior inference performance over
-> existing baselines.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on accelerating long-context LLM inference through adaptive sublayer skipping, which pertains to optimizing LLM architecture and inference methods rather than prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-## [Evaluation of the Code Generation Capabilities of ChatGPT 4: A
-  Comparative Analysis in 19 Programming Languages](https://arxiv.org/abs/http://arxiv.org/abs/2501.02338v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02338v1
-
-**Abstract:**
-> This bachelor's thesis examines the capabilities of ChatGPT 4 in code
-> generation across 19 programming languages. The study analyzed solution rates
-> across three difficulty levels, types of errors encountered, and code quality
-> in terms of runtime and memory efficiency through a quantitative experiment. A
-> total of 188 programming problems were selected from the LeetCode platform, and
-> ChatGPT 4 was given three attempts to produce a correct solution with feedback.
-> ChatGPT 4 successfully solved 39.67% of all tasks, with success rates
-> decreasing significantly as problem complexity increased. Notably, the model
-> faced considerable challenges with hard problems across all languages. ChatGPT
-> 4 demonstrated higher competence in widely used languages, likely due to a
-> larger volume and higher quality of training data. The solution rates also
-> revealed a preference for languages with low abstraction levels and static
-> typing. For popular languages, the most frequent error was "Wrong Answer,"
-> whereas for less popular languages, compiler and runtime errors prevailed,
-> suggesting frequent misunderstandings and confusion regarding the structural
-> characteristics of these languages. The model exhibited above-average runtime
-> efficiency in all programming languages, showing a tendency toward statically
-> typed and low-abstraction languages. Memory efficiency results varied
-> significantly, with above-average performance in 14 languages and below-average
-> performance in five languages. A slight preference for low-abstraction
-> languages and a leaning toward dynamically typed languages in terms of memory
-> efficiency were observed. Future research should include a larger number of
-> tasks, iterations, and less popular languages. Additionally, ChatGPT 4's
-> abilities in code interpretation and summarization, debugging, and the
-> development of complex, practical code could be analyzed further.
-> ----
-> Diese Bachelorarbeit untersucht die F\"ahigkeiten von ChatGPT 4 zur
-> Code-Generierung in 19 Programmiersprachen. Betrachtet wurden die
-> L\"osungsraten zwischen drei Schwierigkeitsgraden, die aufgetretenen
-> Fehlerarten und die Qualit\"at des Codes hinsichtlich der Laufzeit- und
-> Speichereffizienz in einem quantitativen Experiment. Dabei wurden 188
-> Programmierprobleme der Plattform LeetCode entnommen, wobei ChatGPT 4 jeweils
-> drei Versuche hatte, mittels Feedback eine korrekte L\"osung zu generieren.
-> ChatGPT 4 l\"oste 39,67 % aller Aufgaben erfolgreich, wobei die Erfolgsrate mit
-> zunehmendem Schwierigkeitsgrad deutlich abnahm und bei komplexen Problemen in
-> allen Sprachen signifikante Schwierigkeiten auftraten. Das Modell zeigte eine
-> h\"ohere Kompetenz in weit verbreiteten Sprachen, was wahrscheinlich auf eine
-> gr\"o{\ss}ere Menge und h\"ohere Qualit\"at der Trainingsdaten
-> zur\"uckzuf\"uhren ist. Bez\"uglich der L\"osungsraten zeigte das Modell zudem
-> eine Pr\"aferenz f\"ur Sprachen mit niedrigem Abstraktionsniveau und statischer
-> Typisierung. Bei Sprachen hoher Popularit\"at trat der Fehler Wrong Answer am
-> h\"aufigsten auf, w\"ahrend bei weniger popul\"aren Sprachen Compiler- und
-> Laufzeitfehler \"uberwogen, was auf h\"aufige Missverst\"andnisse und
-> Verwechslungen bez\"uglich der spezifischen strukturellen Eigenschaften dieser
-> Sprachen zur\"uckzuf\"uhren ist. ChatGPT 4 demonstrierte in allen
-> Programmiersprachen eine \"uberdurchschnittliche Laufzeiteffizienz und
-> tendierte diesbez\"uglich erneut zu statisch typisierten und niedrig
-> abstrahierten Sprachen. Die Werte zur Speichereffizienz variierten erheblich,
-> wobei in 14 Sprachen \"uberdurchschnittliche und in f\"unf Sprachen
-> unterdurchschnittliche Werte erzielt wurden. Es zeigte sich diesbez\"uglich
-> eine leichte Tendenz zugunsten von niedrig abstrahierten sowie eine Pr\"aferenz
-> zu dynamisch typisierten Sprachen. Zuk\"unftige Forschung sollte eine h\"ohere
-> Anzahl an Aufgaben, Iterationen und unpopul\"aren Sprachen einbeziehen.
-> Dar\"uber hinaus k\"onnten die F\"ahigkeiten von ChatGPT 4 in der
-> Code-Interpretation und -Zusammenfassung, im Debugging und in der Entwicklung
-> komplexer, praxisbezogener Codes analysiert werden.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on evaluating the code generation capabilities of ChatGPT 4 across various programming languages, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), which is the required core subject."
-}
-
----
-
-## [UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude
-  Mobility](https://arxiv.org/abs/http://arxiv.org/abs/2501.02341v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02341v1
-
-**Abstract:**
-> Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has
-> introduced transformative advancements across various domains, like
-> transportation, logistics, and agriculture. Leveraging flexible perspectives
-> and rapid maneuverability, UAVs extend traditional systems' perception and
-> action capabilities, garnering widespread attention from academia and industry.
-> However, current UAV operations primarily depend on human control, with only
-> limited autonomy in simple scenarios, and lack the intelligence and
-> adaptability needed for more complex environments and tasks. The emergence of
-> large language models (LLMs) demonstrates remarkable problem-solving and
-> generalization capabilities, offering a promising pathway for advancing UAV
-> intelligence. This paper explores the integration of LLMs and UAVs, beginning
-> with an overview of UAV systems' fundamental components and functionalities,
-> followed by an overview of the state-of-the-art in LLM technology.
-> Subsequently, it systematically highlights the multimodal data resources
-> available for UAVs, which provide critical support for training and evaluation.
-> Furthermore, it categorizes and analyzes key tasks and application scenarios
-> where UAVs and LLMs converge. Finally, a reference roadmap towards agentic UAVs
-> is proposed, aiming to enable UAVs to achieve agentic intelligence through
-> autonomous perception, memory, reasoning, and tool utilization. Related
-> resources are available at https://github.com/Hub-Tian/UAVs_Meet_LLMs.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on integrating LLMs with UAVs to enhance UAV intelligence, rather than specifically on the engineering, design, or optimization of prompts for Large Language Models, failing to meet the core subject requirement.
-
----
-
-## [Exploring the Capabilities and Limitations of Large Language Models for
-  Radiation Oncology Decision Support](https://arxiv.org/abs/http://arxiv.org/abs/2501.02346v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02346v1
-
-**Abstract:**
-> Thanks to the rapidly evolving integration of LLMs into decision-support
-> tools, a significant transformation is happening across large-scale systems.
-> Like other medical fields, the use of LLMs such as GPT-4 is gaining increasing
-> interest in radiation oncology as well. An attempt to assess GPT-4's
-> performance in radiation oncology was made via a dedicated 100-question
-> examination on the highly specialized topic of radiation oncology physics,
-> revealing GPT-4's superiority over other LLMs. GPT-4's performance on a broader
-> field of clinical radiation oncology is further benchmarked by the ACR
-> Radiation Oncology In-Training (TXIT) exam where GPT-4 achieved a high accuracy
-> of 74.57%. Its performance on re-labelling structure names in accordance with
-> the AAPM TG-263 report has also been benchmarked, achieving above 96%
-> accuracies. Such studies shed light on the potential of LLMs in radiation
-> oncology. As interest in the potential and constraints of LLMs in general
-> healthcare applications continues to rise5, the capabilities and limitations of
-> LLMs in radiation oncology decision support have not yet been fully explored.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on the application of LLMs in radiation oncology, a medically-oriented subject explicitly excluded by the criteria, rather than the engineering, design, or optimization of prompts for LLMs.
-
----
-
-## [Context Aware Lemmatization and Morphological Tagging Method in Turkish](https://arxiv.org/abs/http://arxiv.org/abs/2501.02361v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02361v1
-
-**Abstract:**
-> The smallest part of a word that defines the word is called a word root. Word
-> roots are used to increase success in many applications since they simplify the
-> word. In this study, the lemmatization model, which is a word root finding
-> method, and the morphological tagging model, which predicts the grammatical
-> knowledge of the word, are presented. The presented model was developed for
-> Turkish, and both models make predictions by taking the meaning of the word
-> into account. In the literature, there is no lemmatization study that is
-> sensitive to word meaning in Turkish. For this reason, the present study shares
-> the model and the results obtained from the model on Turkish lemmatization for
-> the first time in the literature. In the present study, in the lemmatization
-> and morphological tagging models, bidirectional LSTM is used for the spelling
-> of words, and the Turkish BERT model is used for the meaning of words. The
-> models are trained using the IMST and PUD datasets from Universal Dependencies.
-> The results from the training of the models were compared with the results from
-> the SIGMORPHON 2019 competition. The results of the comparisons revealed that
-> our models were superior.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on lemmatization and morphological tagging for the Turkish language using BERT and LSTM, without any mention of Large Language Models (LLMs), prompt engineering, or manipulation of textual input prompts to improve LLM performance, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [Enhancing Workplace Productivity and Well-being Using AI Agent](https://arxiv.org/abs/http://arxiv.org/abs/2501.02368v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02368v1
-
-**Abstract:**
-> This paper discusses the use of Artificial Intelligence (AI) to enhance
-> workplace productivity and employee well-being. By integrating machine learning
-> (ML) techniques with neurobiological data, the proposed approaches ensure
-> alignment with human ethical standards through value alignment models and
-> Hierarchical Reinforcement Learning (HRL) for autonomous task management. The
-> system utilizes biometric feedback from employees to generate personalized
-> health prompts, fostering a supportive work environment that encourages
-> physical activity. Additionally, we explore decentralized multi-agent systems
-> for improved collaboration and decision-making frameworks that enhance
-> transparency. Various approaches using ML techniques in conjunction with AI
-> implementations are discussed. Together, these innovations aim to create a more
-> productive and health-conscious workplace. These outcomes assist HR management
-> and organizations in launching more rational career progression streams for
-> employees and facilitating organizational transformation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on using AI for workplace productivity and well-being, integrating various technologies, but does not centrally address prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance, as required.
-
----
-
-## [Syntactic Evolution in Language Usage](https://arxiv.org/abs/http://arxiv.org/abs/2501.02392v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02392v1
-
-**Abstract:**
-> This research aims to investigate the dynamic nature of linguistic style
-> throughout various stages of life, from post teenage to old age. By employing
-> linguistic analysis tools and methodologies, the study will delve into the
-> intricacies of how individuals adapt and modify their language use over time.
-> The research uses a data set of blogs from blogger.com from 2004 and focuses on
-> English for syntactic analysis. The findings of this research can have
-> implications for linguistics, psychology, and communication studies, shedding
-> light on the intricate relationship between age and language.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on the evolution of linguistic style across different age groups, using linguistic analysis tools, and does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through prompt manipulation, nor does it provide concrete examples of prompts and their impact on LLM output."
-}
-
----
-
-## [iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic
-  Traits](https://arxiv.org/abs/http://arxiv.org/abs/2501.02401v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02401v1
-
-**Abstract:**
-> Accurately predicting chronological age from DNA methylation patterns is
-> crucial for advancing biological age estimation. However, this task is made
-> challenging by Epigenetic Correlation Drift (ECD) and Heterogeneity Among CpGs
-> (HAC), which reflect the dynamic relationship between methylation and age
-> across different life stages. To address these issues, we propose a novel
-> two-phase algorithm. The first phase employs similarity searching to cluster
-> methylation profiles by age group, while the second phase uses Explainable
-> Boosting Machines (EBM) for precise, group-specific prediction. Our method not
-> only improves prediction accuracy but also reveals key age-related CpG sites,
-> detects age-specific changes in aging rates, and identifies pairwise
-> interactions between CpG sites. Experimental results show that our approach
-> outperforms traditional epigenetic clocks and machine learning models, offering
-> a more accurate and interpretable solution for biological age estimation with
-> significant implications for aging research.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it does not focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it proposes a novel algorithm for biological age estimation from DNA methylation patterns, unrelated to LLMs or prompt engineering.
-
----
-
-## [Boosting Explainability through Selective Rationalization in Pre-trained
-  Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03182v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03182v1
-
-**Abstract:**
-> The widespread application of pre-trained language models (PLMs) in natural
-> language processing (NLP) has led to increasing concerns about their
-> explainability. Selective rationalization is a self-explanatory framework that
-> selects human-intelligible input subsets as rationales for predictions. Recent
-> studies have shown that applying existing rationalization frameworks to PLMs
-> will result in severe degeneration and failure problems, producing sub-optimal
-> or meaningless rationales. Such failures severely damage trust in
-> rationalization methods and constrain the application of rationalization
-> techniques on PLMs. In this paper, we find that the homogeneity of tokens in
-> the sentences produced by PLMs is the primary contributor to these problems. To
-> address these challenges, we propose a method named Pre-trained Language
-> Model's Rationalization (PLMR), which splits PLMs into a generator and a
-> predictor to deal with NLP tasks while providing interpretable rationales. The
-> generator in PLMR also alleviates homogeneity by pruning irrelevant tokens,
-> while the predictor uses full-text information to standardize predictions.
-> Experiments conducted on two widely used datasets across multiple PLMs
-> demonstrate the effectiveness of the proposed method PLMR in addressing the
-> challenge of applying selective rationalization to PLMs. Codes:
-> https://github.com/ylb777/PLMR.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on improving the explainability of Pre-trained Language Models (PLMs) through selective rationalization, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs). The core subject is not prompt engineering for text-based interactions with LLMs, but rather addressing challenges in applying rationalization techniques to PLMs.
-
----
-
-## [AI-ANNE: (A) (N)eural (N)et for (E)xploration: Transferring Deep
-  Learning Models onto Microcontrollers and Embedded Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.03256v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03256v1
-
-**Abstract:**
-> This working paper explores the integration of neural networks onto
-> resource-constrained embedded systems like a Raspberry Pi Pico / Raspberry Pi
-> Pico 2. A TinyML aproach transfers neural networks directly on these
-> microcontrollers, enabling real-time, low-latency, and energy-efficient
-> inference while maintaining data privacy. Therefore, AI-ANNE: (A) (N)eural
-> (N)et for (E)xploration will be presented, which facilitates the transfer of
-> pre-trained models from high-performance platforms like TensorFlow and Keras
-> onto microcontrollers, using a lightweight programming language like
-> MicroPython. This approach demonstrates how neural network architectures, such
-> as neurons, layers, density and activation functions can be implemented in
-> MicroPython in order to deal with the computational limitations of embedded
-> systems. Based on the Raspberry Pi Pico / Raspberry Pi Pico 2, two different
-> neural networks on microcontrollers are presented for an example of data
-> classification. As an further application example, such a microcontroller can
-> be used for condition monitoring, where immediate corrective measures are
-> triggered on the basis of sensor data. Overall, this working paper presents a
-> very easy-to-implement way of using neural networks on energy-efficient devices
-> such as microcontrollers. This makes AI-ANNE: (A) (N)eural (N)et for
-> (E)xploration not only suited for practical use, but also as an educational
-> tool with clear insights into how neural networks operate.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on transferring deep learning models onto microcontrollers and embedded systems, with no emphasis on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System
-  Strategies](https://arxiv.org/abs/http://arxiv.org/abs/2501.03265v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03265v1
-
-**Abstract:**
-> The emergence of 5G and edge computing hardware has brought about a
-> significant shift in artificial intelligence, with edge AI becoming a crucial
-> technology for enabling intelligent applications. With the growing amount of
-> data generated and stored on edge devices, deploying AI models for local
-> processing and inference has become increasingly necessary. However, deploying
-> state-of-the-art AI models on resource-constrained edge devices faces
-> significant challenges that must be addressed. This paper presents an
-> optimization triad for efficient and reliable edge AI deployment, including
-> data, model, and system optimization. First, we discuss optimizing data through
-> data cleaning, compression, and augmentation to make it more suitable for edge
-> deployment. Second, we explore model design and compression methods at the
-> model level, such as pruning, quantization, and knowledge distillation.
-> Finally, we introduce system optimization techniques like framework support and
-> hardware acceleration to accelerate edge AI workflows. Based on an in-depth
-> analysis of various application scenarios and deployment challenges of edge AI,
-> this paper proposes an optimization paradigm based on the data-model-system
-> triad to enable a whole set of solutions to effectively transfer ML models,
-> which are initially trained in the cloud, to various edge devices for
-> supporting multiple scenarios.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs), instead concentrating on optimizing data, model, and system strategies for edge AI deployment, which falls outside the specified criteria.
-
----
-
-## [Heterogeneous Graph Pre-training Based Model for Secure and Efficient
-  Prediction of Default Risk Propagation among Bond Issuers](https://arxiv.org/abs/http://arxiv.org/abs/2501.03268v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03268v1
-
-**Abstract:**
-> Efficient prediction of default risk for bond-issuing enterprises is pivotal
-> for maintaining stability and fostering growth in the bond market. Conventional
-> methods usually rely solely on an enterprise's internal data for risk
-> assessment. In contrast, graph-based techniques leverage interconnected
-> corporate information to enhance default risk identification for targeted bond
-> issuers. Traditional graph techniques such as label propagation algorithm or
-> deepwalk fail to effectively integrate a enterprise's inherent attribute
-> information with its topological network data. Additionally, due to data
-> scarcity and security privacy concerns between enterprises, end-to-end graph
-> neural network (GNN) algorithms may struggle in delivering satisfactory
-> performance for target tasks. To address these challenges, we present a novel
-> two-stage model. In the first stage, we employ an innovative Masked
-> Autoencoders for Heterogeneous Graph (HGMAE) to pre-train on a vast enterprise
-> knowledge graph. Subsequently, in the second stage, a specialized classifier
-> model is trained to predict default risk propagation probabilities. The
-> classifier leverages concatenated feature vectors derived from the pre-trained
-> encoder with the enterprise's task-specific feature vectors. Through the
-> two-stage training approach, our model not only boosts the importance of unique
-> bond characteristics for specific default prediction tasks, but also securely
-> and efficiently leverage the global information pre-trained from other
-> enterprises. Experimental results demonstrate that our proposed model
-> outperforms existing approaches in predicting default risk for bond issuers.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a two-stage model for predicting default risk propagation among bond issuers using heterogeneous graph pre-training, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Knowledge-Guided Biomarker Identification for Label-Free Single-Cell
-  RNA-Seq Data: A Reinforcement Learning Perspective](https://arxiv.org/abs/http://arxiv.org/abs/2501.04718v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04718v1
-
-**Abstract:**
-> Gene panel selection aims to identify the most informative genomic biomarkers
-> in label-free genomic datasets. Traditional approaches, which rely on domain
-> expertise, embedded machine learning models, or heuristic-based iterative
-> optimization, often introduce biases and inefficiencies, potentially obscuring
-> critical biological signals. To address these challenges, we present an
-> iterative gene panel selection strategy that harnesses ensemble knowledge from
-> existing gene selection algorithms to establish preliminary boundaries or prior
-> knowledge, which guide the initial search space. Subsequently, we incorporate
-> reinforcement learning through a reward function shaped by expert behavior,
-> enabling dynamic refinement and targeted selection of gene panels. This
-> integration mitigates biases stemming from initial boundaries while
-> capitalizing on RL's stochastic adaptability. Comprehensive comparative
-> experiments, case studies, and downstream analyses demonstrate the
-> effectiveness of our method, highlighting its improved precision and efficiency
-> for label-free biomarker discovery. Our results underscore the potential of
-> this approach to advance single-cell genomics data analysis.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on applying reinforcement learning for biomarker identification in genomics, with no apparent emphasis on prompt engineering, Large Language Models (LLMs), or manipulation of textual input prompts for LLM performance improvement."
-}
-
----
-
-## [Calculating Customer Lifetime Value and Churn using Beta Geometric
-  Negative Binomial and Gamma-Gamma Distribution in a NFT based setting](https://arxiv.org/abs/http://arxiv.org/abs/2501.04719v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04719v1
-
-**Abstract:**
-> Customer Lifetime Value (CLV) is an important metric that measures the total
-> value a customer will bring to a business over their lifetime. The Beta
-> Geometric Negative Binomial Distribution (BGNBD) and Gamma Gamma Distribution
-> are two models that can be used to calculate CLV, taking into account both the
-> frequency and value of customer transactions. This article explains the BGNBD
-> and Gamma Gamma Distribution models, and how they can be used to calculate CLV
-> for NFT (Non-Fungible Token) transaction data in a blockchain setting. By
-> estimating the parameters of these models using historical transaction data,
-> businesses can gain insights into the lifetime value of their customers and
-> make data-driven decisions about marketing and customer retention strategies.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on calculating Customer Lifetime Value using statistical models in an NFT setting, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Multi-task Domain Adaptation for Computation Offloading in
-  Edge-intelligence Networks](https://arxiv.org/abs/http://arxiv.org/abs/2501.07585v1)
-**arXiv ID:** http://arxiv.org/abs/2501.07585v1
-
-**Abstract:**
-> In the field of multi-access edge computing (MEC), efficient computation
-> offloading is crucial for improving resource utilization and reducing latency
-> in dynamically changing environments. This paper introduces a new approach,
-> termed as Multi-Task Domain Adaptation (MTDA), aiming to enhance the ability of
-> computational offloading models to generalize in the presence of domain shifts,
-> i.e., when new data in the target environment significantly differs from the
-> data in the source domain. The proposed MTDA model incorporates a
-> teacher-student architecture that allows continuous adaptation without
-> necessitating access to the source domain data during inference, thereby
-> maintaining privacy and reducing computational overhead. Utilizing a multi-task
-> learning framework that simultaneously manages offloading decisions and
-> resource allocation, the proposed MTDA approach outperforms benchmark methods
-> regarding mean squared error and accuracy, particularly in environments with
-> increasing numbers of users. It is observed by means of computer simulation
-> that the proposed MTDA model maintains high performance across various
-> scenarios, demonstrating its potential for practical deployment in emerging MEC
-> applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses primarily on computation offloading in edge-intelligence networks, multi-task domain adaptation, and resource allocation, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thereby failing to address the core subject of prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Adjoint sharding for very long context training of state space models](https://arxiv.org/abs/http://arxiv.org/abs/2501.00692v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00692v1
-
-**Abstract:**
-> Despite very fast progress, efficiently training large language models (LLMs)
-> in very long contexts remains challenging. Existing methods fall back to
-> training LLMs with short contexts (a maximum of a few thousands tokens in
-> training) and use inference time techniques when evaluating on long contexts
-> (above 1M tokens context window at inference). As opposed to
-> long-context-inference, training on very long context input prompts is quickly
-> limited by GPU memory availability and by the prohibitively long training times
-> it requires on state-of-the-art hardware. Meanwhile, many real-life
-> applications require not only inference but also training/fine-tuning with long
-> context on specific tasks. Such applications include, for example, augmenting
-> the context with various sources of raw reference information for fact
-> extraction, fact summarization, or fact reconciliation tasks. We propose
-> adjoint sharding, a novel technique that comprises sharding gradient
-> calculation during training to reduce memory requirements by orders of
-> magnitude, making training on very long context computationally tractable.
-> Adjoint sharding is based on the adjoint method and computes equivalent
-> gradients to backpropagation. We also propose truncated adjoint sharding to
-> speed up the algorithm while maintaining performance. We provide a distributed
-> version, and a paralleled version of adjoint sharding to further speed up
-> training. Empirical results show the proposed adjoint sharding algorithm
-> reduces memory usage by up to 3X with a 1.27B parameter large language model on
-> 1M context length training. This allows to increase the maximum context length
-> during training or fine-tuning of a 1.27B parameter model from 35K tokens to
-> above 100K tokens on a training infrastructure composed of five AWS P4
-> instances.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on a novel technique for reducing memory requirements during LLM training (adjoint sharding) rather than engineering, designing, or optimizing prompts specifically for improving LLM performance through textual input manipulation.
-
----
-
-## [Everywhere Attack: Attacking Locally and Globally to Boost Targeted
-  Transferability](https://arxiv.org/abs/http://arxiv.org/abs/2501.00707v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00707v1
-
-**Abstract:**
-> Adversarial examples' (AE) transferability refers to the phenomenon that AEs
-> crafted with one surrogate model can also fool other models. Notwithstanding
-> remarkable progress in untargeted transferability, its targeted counterpart
-> remains challenging. This paper proposes an everywhere scheme to boost targeted
-> transferability. Our idea is to attack a victim image both globally and
-> locally. We aim to optimize 'an army of targets' in every local image region
-> instead of the previous works that optimize a high-confidence target in the
-> image. Specifically, we split a victim image into non-overlap blocks and
-> jointly mount a targeted attack on each block. Such a strategy mitigates
-> transfer failures caused by attention inconsistency between surrogate and
-> victim models and thus results in stronger transferability. Our approach is
-> method-agnostic, which means it can be easily combined with existing
-> transferable attacks for even higher transferability. Extensive experiments on
-> ImageNet demonstrate that the proposed approach universally improves the
-> state-of-the-art targeted attacks by a clear margin, e.g., the transferability
-> of the widely adopted Logit attack can be improved by 28.8%-300%.We also
-> evaluate the crafted AEs on a real-world platform: Google Cloud Vision. Results
-> further support the superiority of the proposed method.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on improving the transferability of adversarial examples in image classification, with no relevance to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [An AI-powered Bayesian generative modeling approach for causal inference
-  in observational studies](https://arxiv.org/abs/http://arxiv.org/abs/2501.00755v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00755v1
-
-**Abstract:**
-> Causal inference in observational studies with high-dimensional covariates
-> presents significant challenges. We introduce CausalBGM, an AI-powered Bayesian
-> generative modeling approach that captures the causal relationship among
-> covariates, treatment, and outcome variables. The core innovation of CausalBGM
-> lies in its ability to estimate the individual treatment effect (ITE) by
-> learning individual-specific distributions of a low-dimensional latent feature
-> set (e.g., latent confounders) that drives changes in both treatment and
-> outcome. This approach not only effectively mitigates confounding effects but
-> also provides comprehensive uncertainty quantification, offering reliable and
-> interpretable causal effect estimates at the individual level. CausalBGM adopts
-> a Bayesian model and uses a novel iterative algorithm to update the model
-> parameters and the posterior distribution of latent features until convergence.
-> This framework leverages the power of AI to capture complex dependencies among
-> variables while adhering to the Bayesian principles. Extensive experiments
-> demonstrate that CausalBGM consistently outperforms state-of-the-art methods,
-> particularly in scenarios with high-dimensional covariates and large-scale
-> datasets. Its Bayesian foundation ensures statistical rigor, providing robust
-> and well-calibrated posterior intervals. By addressing key limitations of
-> existing methods, CausalBGM emerges as a robust and promising framework for
-> advancing causal inference in modern applications in fields such as genomics,
-> healthcare, and social sciences. CausalBGM is maintained at the website
-> https://causalbgm.readthedocs.io/.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a Bayesian generative modeling approach for causal inference in observational studies, leveraging AI, but does not primarily address the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Revisiting Graph Neural Networks on Graph-level Tasks: Comprehensive
-  Experiments, Analysis, and Improvements](https://arxiv.org/abs/http://arxiv.org/abs/2501.00773v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00773v1
-
-**Abstract:**
-> Graphs are essential data structures for modeling complex interactions in
-> domains such as social networks, molecular structures, and biological systems.
-> Graph-level tasks, which predict properties or classes for the entire graph,
-> are critical for applications, such as molecular property prediction and
-> subgraph counting. Graph Neural Networks (GNNs) have shown promise in these
-> tasks, but their evaluations are often limited to narrow datasets, tasks, and
-> inconsistent experimental setups, restricting their generalizability. To
-> address these limitations, we propose a unified evaluation framework for
-> graph-level GNNs. This framework provides a standardized setting to evaluate
-> GNNs across diverse datasets, various graph tasks (e.g., graph classification
-> and regression), and challenging scenarios, including noisy, imbalanced, and
-> few-shot graphs. Additionally, we propose a novel GNN model with enhanced
-> expressivity and generalization capabilities. Specifically, we enhance the
-> expressivity of GNNs through a $k$-path rooted subgraph approach, enabling the
-> model to effectively count subgraphs (e.g., paths and cycles). Moreover, we
-> introduce a unified graph contrastive learning algorithm for graphs across
-> diverse domains, which adaptively removes unimportant edges to augment graphs,
-> thereby significantly improving generalization performance. Extensive
-> experiments demonstrate that our model achieves superior performance against
-> fourteen effective baselines across twenty-seven graph datasets, establishing
-> it as a robust and generalizable model for graph-level tasks.
-
-**Decision Explanation:**
-Original response: json
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the core criteria as it focuses primarily on the development and optimization of Graph Neural Networks (GNNs) for graph-level tasks, with no indication of Large Language Models (LLMs), prompt engineering, or text generation being the central focus."
-}
-
----
-
-## [LENS-XAI: Redefining Lightweight and Explainable Network Security
-  through Knowledge Distillation and Variational Autoencoders for Scalable
-  Intrusion Detection in Cybersecurity](https://arxiv.org/abs/http://arxiv.org/abs/2501.00790v2)
-**arXiv ID:** http://arxiv.org/abs/2501.00790v2
-
-**Abstract:**
-> The rapid proliferation of Industrial Internet of Things (IIoT) systems
-> necessitates advanced, interpretable, and scalable intrusion detection systems
-> (IDS) to combat emerging cyber threats. Traditional IDS face challenges such as
-> high computational demands, limited explainability, and inflexibility against
-> evolving attack patterns. To address these limitations, this study introduces
-> the Lightweight Explainable Network Security framework (LENS-XAI), which
-> combines robust intrusion detection with enhanced interpretability and
-> scalability. LENS-XAI integrates knowledge distillation, variational
-> autoencoder models, and attribution-based explainability techniques to achieve
-> high detection accuracy and transparency in decision-making. By leveraging a
-> training set comprising 10% of the available data, the framework optimizes
-> computational efficiency without sacrificing performance. Experimental
-> evaluation on four benchmark datasets: Edge-IIoTset, UKM-IDS20, CTU-13, and
-> NSL-KDD, demonstrates the framework's superior performance, achieving detection
-> accuracies of 95.34%, 99.92%, 98.42%, and 99.34%, respectively. Additionally,
-> the framework excels in reducing false positives and adapting to complex attack
-> scenarios, outperforming existing state-of-the-art methods. Key strengths of
-> LENS-XAI include its lightweight design, suitable for resource-constrained
-> environments, and its scalability across diverse IIoT and cybersecurity
-> contexts. Moreover, the explainability module enhances trust and transparency,
-> critical for practical deployment in dynamic and sensitive applications. This
-> research contributes significantly to advancing IDS by addressing computational
-> efficiency, feature interpretability, and real-world applicability. Future work
-> could focus on extending the framework to ensemble AI systems for distributed
-> environments, further enhancing its robustness and adaptability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a lightweight and explainable network security framework (LENS-XAI) for intrusion detection in cybersecurity, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Decoupling Knowledge and Reasoning in Transformers: A Modular
-  Architecture with Generalized Cross-Attention](https://arxiv.org/abs/http://arxiv.org/abs/2501.00823v2)
-**arXiv ID:** http://arxiv.org/abs/2501.00823v2
-
-**Abstract:**
-> Transformers have achieved remarkable success across diverse domains, but
-> their monolithic architecture presents challenges in interpretability,
-> adaptability, and scalability. This paper introduces a novel modular
-> Transformer architecture that explicitly decouples knowledge and reasoning
-> through a generalized cross-attention mechanism to a globally shared knowledge
-> base with layer-specific transformations, specifically designed for effective
-> knowledge retrieval. Critically, we provide a rigorous mathematical derivation
-> demonstrating that the Feed-Forward Network (FFN) in a standard Transformer is
-> a specialized case (a closure) of this generalized cross-attention, revealing
-> its role in implicit knowledge retrieval and validating our design. This
-> theoretical framework provides a new lens for understanding FFNs and lays the
-> foundation for future research exploring enhanced interpretability,
-> adaptability, and scalability, enabling richer interplay with external
-> knowledge bases and other systems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a novel modular Transformer architecture, decoupling knowledge and reasoning, which falls under developing new LLM architectures, violating the 'MUST NOT' criteria (1). Prompt engineering for text-based interactions with LLMs is not the central focus.
-
----
-
-## [What is a Social Media Bot? A Global Comparison of Bot and Human
-  Characteristics](https://arxiv.org/abs/http://arxiv.org/abs/2501.00855v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00855v1
-
-**Abstract:**
-> Chatter on social media is 20% bots and 80% humans. Chatter by bots and
-> humans is consistently different: bots tend to use linguistic cues that can be
-> easily automated while humans use cues that require dialogue understanding.
-> Bots use words that match the identities they choose to present, while humans
-> may send messages that are not related to the identities they present. Bots and
-> humans differ in their communication structure: sampled bots have a star
-> interaction structure, while sampled humans have a hierarchical structure.
-> These conclusions are based on a large-scale analysis of social media tweets
-> across ~200mil users across 7 events. Social media bots took the world by storm
-> when social-cybersecurity researchers realized that social media users not only
-> consisted of humans but also of artificial agents called bots. These bots wreck
-> havoc online by spreading disinformation and manipulating narratives. Most
-> research on bots are based on special-purposed definitions, mostly predicated
-> on the event studied. This article first begins by asking, "What is a bot?",
-> and we study the underlying principles of how bots are different from humans.
-> We develop a first-principle definition of a social media bot. With this
-> definition as a premise, we systematically compare characteristics between bots
-> and humans across global events, and reflect on how the software-programmed bot
-> is an Artificial Intelligent algorithm, and its potential for evolution as
-> technology advances. Based on our results, we provide recommendations for the
-> use and regulation of bots. Finally, we discuss open challenges and future
-> directions: Detect, to systematically identify these automated and potentially
-> evolving bots; Differentiate, to evaluate the goodness of the bot in terms of
-> their content postings and relationship interactions; Disrupt, to moderate the
-> impact of malicious bots.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it primarily focuses on analyzing social media bot characteristics, their differentiation from humans, and regulatory recommendations, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [DiffETM: Diffusion Process Enhanced Embedded Topic Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.00862v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00862v1
-
-**Abstract:**
-> The embedded topic model (ETM) is a widely used approach that assumes the
-> sampled document-topic distribution conforms to the logistic normal
-> distribution for easier optimization. However, this assumption oversimplifies
-> the real document-topic distribution, limiting the model's performance. In
-> response, we propose a novel method that introduces the diffusion process into
-> the sampling process of document-topic distribution to overcome this limitation
-> and maintain an easy optimization process. We validate our method through
-> extensive experiments on two mainstream datasets, proving its effectiveness in
-> improving topic modeling performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs), instead proposing a method for enhancing an Embedded Topic Model using a diffusion process, which does not meet the primary criteria of investigating, analyzing, or proposing methods for improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Representation in large language models](https://arxiv.org/abs/http://arxiv.org/abs/2501.00885v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00885v1
-
-**Abstract:**
-> The extraordinary success of recent Large Language Models (LLMs) on a diverse
-> array of tasks has led to an explosion of scientific and philosophical
-> theorizing aimed at explaining how they do what they do. Unfortunately,
-> disagreement over fundamental theoretical issues has led to stalemate, with
-> entrenched camps of LLM optimists and pessimists often committed to very
-> different views of how these systems work. Overcoming stalemate requires
-> agreement on fundamental questions, and the goal of this paper is to address
-> one such question, namely: is LLM behavior driven partly by
-> representation-based information processing of the sort implicated in
-> biological cognition, or is it driven entirely by processes of memorization and
-> stochastic table look-up? This is a question about what kind of algorithm LLMs
-> implement, and the answer carries serious implications for higher level
-> questions about whether these systems have beliefs, intentions, concepts,
-> knowledge, and understanding. I argue that LLM behavior is partially driven by
-> representation-based information processing, and then I describe and defend a
-> series of practical techniques for investigating these representations and
-> developing explanations on their basis. The resulting account provides a
-> groundwork for future theorizing about language models and their successors.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on the theoretical foundations and internal workings of Large Language Models (LLMs), specifically whether LLM behavior is driven by representation-based information processing. While it touches on understanding LLMs, it does not primarily investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts, nor does it provide concrete examples of prompts and their impact on LLM output."
-}
-
----
-
-## [Demystifying Online Clustering of Bandits: Enhanced Exploration Under
-  Stochastic and Smoothed Adversarial Contexts](https://arxiv.org/abs/http://arxiv.org/abs/2501.00891v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00891v1
-
-**Abstract:**
-> The contextual multi-armed bandit (MAB) problem is crucial in sequential
-> decision-making. A line of research, known as online clustering of bandits,
-> extends contextual MAB by grouping similar users into clusters, utilizing
-> shared features to improve learning efficiency. However, existing algorithms,
-> which rely on the upper confidence bound (UCB) strategy, struggle to gather
-> adequate statistical information to accurately identify unknown user clusters.
-> As a result, their theoretical analyses require several strong assumptions
-> about the "diversity" of contexts generated by the environment, leading to
-> impractical settings, complicated analyses, and poor practical performance.
-> Removing these assumptions has been a long-standing open problem in the
-> clustering of bandits literature. In this paper, we provide two solutions to
-> this open problem. First, following the i.i.d. context generation setting in
-> existing studies, we propose two novel algorithms, UniCLUB and PhaseUniCLUB,
-> which incorporate enhanced exploration mechanisms to accelerate cluster
-> identification. Remarkably, our algorithms require substantially weaker
-> assumptions while achieving regret bounds comparable to prior work. Second,
-> inspired by the smoothed analysis framework, we propose a more practical
-> setting that eliminates the requirement for i.i.d. context generation used in
-> previous studies, thus enhancing the performance of existing algorithms for
-> online clustering of bandits. Our technique can be applied to both graph-based
-> and set-based clustering of bandits frameworks. Extensive evaluations on both
-> synthetic and real-world datasets demonstrate that our proposed algorithms
-> consistently outperform existing approaches.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses on online clustering of bandits, multi-armed bandit problems, and sequential decision-making, with no primary focus on engineering, design, or optimization of prompts for Large Language Models (LLMs) or manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Large Language Model Based Multi-Agent System Augmented Complex Event
-  Processing Pipeline for Internet of Multimedia Things](https://arxiv.org/abs/http://arxiv.org/abs/2501.00906v2)
-**arXiv ID:** http://arxiv.org/abs/2501.00906v2
-
-**Abstract:**
-> This paper presents the development and evaluation of a Large Language Model
-> (LLM), also known as foundation models, based multi-agent system framework for
-> complex event processing (CEP) with a focus on video query processing use
-> cases. The primary goal is to create a proof-of-concept (POC) that integrates
-> state-of-the-art LLM orchestration frameworks with publish/subscribe (pub/sub)
-> tools to address the integration of LLMs with current CEP systems. Utilizing
-> the Autogen framework in conjunction with Kafka message brokers, the system
-> demonstrates an autonomous CEP pipeline capable of handling complex workflows.
-> Extensive experiments evaluate the system's performance across varying
-> configurations, complexities, and video resolutions, revealing the trade-offs
-> between functionality and latency. The results show that while higher agent
-> count and video complexities increase latency, the system maintains high
-> consistency in narrative coherence. This research builds upon and contributes
-> to, existing novel approaches to distributed AI systems, offering detailed
-> insights into integrating such systems into existing infrastructures.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on integrating Large Language Models into a multi-agent system for complex event processing, not on prompt engineering for text-based interactions with LLMs, failing to meet the core subject requirement.
-
----
-
-## [Enhancing Early Diabetic Retinopathy Detection through Synthetic DR1
-  Image Generation: A StyleGAN3 Approach](https://arxiv.org/abs/http://arxiv.org/abs/2501.00954v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00954v1
-
-**Abstract:**
-> Diabetic Retinopathy (DR) is a leading cause of preventable blindness. Early
-> detection at the DR1 stage is critical but is hindered by a scarcity of
-> high-quality fundus images. This study uses StyleGAN3 to generate synthetic DR1
-> images characterized by microaneurysms with high fidelity and diversity. The
-> aim is to address data scarcity and enhance the performance of supervised
-> classifiers. A dataset of 2,602 DR1 images was used to train the model,
-> followed by a comprehensive evaluation using quantitative metrics, including
-> Frechet Inception Distance (FID), Kernel Inception Distance (KID), and
-> Equivariance with respect to translation (EQ-T) and rotation (EQ-R).
-> Qualitative assessments included Human Turing tests, where trained
-> ophthalmologists evaluated the realism of synthetic images. Spectral analysis
-> further validated image quality. The model achieved a final FID score of 17.29,
-> outperforming the mean FID of 21.18 (95 percent confidence interval - 20.83 to
-> 21.56) derived from bootstrap resampling. Human Turing tests demonstrated the
-> model's ability to produce highly realistic images, though minor artifacts near
-> the borders were noted. These findings suggest that StyleGAN3-generated
-> synthetic DR1 images hold significant promise for augmenting training datasets,
-> enabling more accurate early detection of Diabetic Retinopathy. This
-> methodology highlights the potential of synthetic data in advancing medical
-> imaging and AI-driven diagnostics.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on medical imaging (Diabetic Retinopathy detection) using StyleGAN3 for synthetic image generation, and does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through the manipulation of textual input prompts, thus failing to meet the core 'MUST' criteria.
-
----
-
-## [The Silent Majority: Demystifying Memorization Effect in the Presence of
-  Spurious Correlations](https://arxiv.org/abs/http://arxiv.org/abs/2501.00961v2)
-**arXiv ID:** http://arxiv.org/abs/2501.00961v2
-
-**Abstract:**
-> Machine learning models often rely on simple spurious features -- patterns in
-> training data that correlate with targets but are not causally related to them,
-> like image backgrounds in foreground classification. This reliance typically
-> leads to imbalanced test performance across minority and majority groups. In
-> this work, we take a closer look at the fundamental cause of such imbalanced
-> performance through the lens of memorization, which refers to the ability to
-> predict accurately on \textit{atypical} examples (minority groups) in the
-> training set but failing in achieving the same accuracy in the testing set.
-> This paper systematically shows the ubiquitous existence of spurious features
-> in a small set of neurons within the network, providing the first-ever evidence
-> that memorization may contribute to imbalanced group performance. Through three
-> experimental sources of converging empirical evidence, we find the property of
-> a small subset of neurons or channels in memorizing minority group information.
-> Inspired by these findings, we articulate the hypothesis: the imbalanced group
-> performance is a byproduct of ``noisy'' spurious memorization confined to a
-> small set of neurons. To further substantiate this hypothesis, we show that
-> eliminating these unnecessary spurious memorization patterns via a novel
-> framework during training can significantly affect the model performance on
-> minority groups. Our experimental results across various architectures and
-> benchmarks offer new insights on how neural networks encode core and spurious
-> knowledge, laying the groundwork for future research in demystifying robustness
-> to spurious correlation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria, as it primarily focuses on understanding memorization effects and spurious correlations in machine learning models, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not provide concrete examples of prompts and their impact on LLM output.
-
----
-
-## [FlashInfer: Efficient and Customizable Attention Engine for LLM
-  Inference Serving](https://arxiv.org/abs/http://arxiv.org/abs/2501.01005v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01005v1
-
-**Abstract:**
-> Transformers, driven by attention mechanisms, form the foundation of large
-> language models (LLMs). As these models scale up, efficient GPU attention
-> kernels become essential for high-throughput and low-latency inference. Diverse
-> LLM applications demand flexible and high-performance attention solutions. We
-> present FlashInfer: a customizable and efficient attention engine for LLM
-> serving. FlashInfer tackles KV-cache storage heterogeneity using block-sparse
-> format and composable formats to optimize memory access and reduce redundancy.
-> It also offers a customizable attention template, enabling adaptation to
-> various settings through Just-In-Time (JIT) compilation. Additionally,
-> FlashInfer's load-balanced scheduling algorithm adjusts to dynamism of user
-> requests while maintaining compatibility with CUDAGraph which requires static
-> configuration. FlashInfer have been integrated into leading LLM serving
-> frameworks like SGLang, vLLM and MLC-Engine. Comprehensive kernel-level and
-> end-to-end evaluations demonstrate FlashInfer's ability to significantly boost
-> kernel performance across diverse inference scenarios: compared to
-> state-of-the-art LLM serving solutions, FlashInfer achieve 29-69%
-> inter-token-latency reduction compared to compiler backends for LLM serving
-> benchmark, 28-30% latency reduction for long-context inference, and 13-17%
-> speedup for LLM serving with parallel generation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing an efficient attention engine for Large Language Model (LLM) inference serving, which falls under the development of new LLM architectures or training methods, and does not centrally address prompt engineering for text-based interactions with LLMs.
-
----
-
-## [CryptoMamba: Leveraging State Space Models for Accurate Bitcoin Price
-  Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.01010v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01010v1
-
-**Abstract:**
-> Predicting Bitcoin price remains a challenging problem due to the high
-> volatility and complex non-linear dynamics of cryptocurrency markets.
-> Traditional time-series models, such as ARIMA and GARCH, and recurrent neural
-> networks, like LSTMs, have been widely applied to this task but struggle to
-> capture the regime shifts and long-range dependencies inherent in the data. In
-> this work, we propose CryptoMamba, a novel Mamba-based State Space Model (SSM)
-> architecture designed to effectively capture long-range dependencies in
-> financial time-series data. Our experiments show that CryptoMamba not only
-> provides more accurate predictions but also offers enhanced generalizability
-> across different market conditions, surpassing the limitations of previous
-> models. Coupled with trading algorithms for real-world scenarios, CryptoMamba
-> demonstrates its practical utility by translating accurate forecasts into
-> financial outcomes. Our findings signal a huge advantage for SSMs in stock and
-> cryptocurrency price forecasting tasks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a State Space Model (SSM) for Bitcoin price prediction, with no primary emphasis on the engineering, design, or optimization of prompts for Large Language Models (LLMs), failing to meet the core subject requirement.
-
----
-
-## [ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented
-  Contextual Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01031v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01031v2
-
-**Abstract:**
-> Cultural values alignment in Large Language Models (LLMs) is a critical
-> challenge due to their tendency to embed Western-centric biases from training
-> data, leading to misrepresentations and fairness issues in cross-cultural
-> contexts. Recent approaches, such as role-assignment and few-shot learning,
-> often struggle with reliable cultural alignment as they heavily rely on
-> pre-trained knowledge, lack scalability, and fail to capture nuanced cultural
-> values effectively. To address these issues, we propose ValuesRAG, a novel and
-> effective framework that applies Retrieval-Augmented Generation (RAG) with
-> In-Context Learning (ICL) to integrate cultural and demographic knowledge
-> dynamically during text generation. Leveraging the World Values Survey (WVS)
-> dataset, ValuesRAG first generates summaries of values for each individual.
-> Subsequently, we curate several representative regional datasets to serve as
-> test datasets and retrieve relevant summaries of values based on demographic
-> features, followed by a reranking step to select the top-k relevant summaries.
-> ValuesRAG consistently outperforms baseline methods, both in the main
-> experiment and in the ablation study where only the values summary was
-> provided. Notably, ValuesRAG demonstrates an accuracy of 21% improvement over
-> other baseline methods, highlighting its potential to foster culturally aligned
-> AI systems and enhance the inclusivity of AI-driven applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing cultural alignment in LLMs through Retrieval-Augmented Generation and In-Context Learning, rather than specifically investigating, analyzing, or proposing methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
-## [MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for
-  Driving Perception](https://arxiv.org/abs/http://arxiv.org/abs/2501.01037v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01037v1
-
-**Abstract:**
-> Multi-sensor fusion models play a crucial role in autonomous driving
-> perception, particularly in tasks like 3D object detection and HD map
-> construction. These models provide essential and comprehensive static
-> environmental information for autonomous driving systems. While camera-LiDAR
-> fusion methods have shown promising results by integrating data from both
-> modalities, they often depend on complete sensor inputs. This reliance can lead
-> to low robustness and potential failures when sensors are corrupted or missing,
-> raising significant safety concerns. To tackle this challenge, we introduce the
-> Multi-Sensor Corruption Benchmark (MSC-Bench), the first comprehensive
-> benchmark aimed at evaluating the robustness of multi-sensor autonomous driving
-> perception models against various sensor corruptions. Our benchmark includes 16
-> combinations of corruption types that disrupt both camera and LiDAR inputs,
-> either individually or concurrently. Extensive evaluations of six 3D object
-> detection models and four HD map construction models reveal substantial
-> performance degradation under adverse weather conditions and sensor failures,
-> underscoring critical safety issues. The benchmark toolkit and affiliated code
-> and model checkpoints have been made publicly accessible.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on autonomous driving perception, multi-sensor fusion, and benchmarking for robustness against sensor corruptions, with no mention or emphasis on prompt engineering for Large Language Models (LLMs) or text generation, thus failing to meet the 'MUST' criteria.
-
----
-
-## [MMVA: Multimodal Matching Based on Valence and Arousal across Images,
-  Music, and Musical Captions](https://arxiv.org/abs/http://arxiv.org/abs/2501.01094v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01094v1
-
-**Abstract:**
-> We introduce Multimodal Matching based on Valence and Arousal (MMVA), a
-> tri-modal encoder framework designed to capture emotional content across
-> images, music, and musical captions. To support this framework, we expand the
-> Image-Music-Emotion-Matching-Net (IMEMNet) dataset, creating IMEMNet-C which
-> includes 24,756 images and 25,944 music clips with corresponding musical
-> captions. We employ multimodal matching scores based on the continuous valence
-> (emotional positivity) and arousal (emotional intensity) values. This
-> continuous matching score allows for random sampling of image-music pairs
-> during training by computing similarity scores from the valence-arousal values
-> across different modalities. Consequently, the proposed approach achieves
-> state-of-the-art performance in valence-arousal prediction tasks. Furthermore,
-> the framework demonstrates its efficacy in various zeroshot tasks, highlighting
-> the potential of valence and arousal predictions in downstream applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multimodal matching across images, music, and musical captions, primarily for valence and arousal prediction, with no clear emphasis on the engineering, design, or optimization of textual input prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Disambiguation of Chinese Polyphones in an End-to-End Framework with
-  Semantic Features Extracted by Pre-trained BERT](https://arxiv.org/abs/http://arxiv.org/abs/2501.01102v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01102v1
-
-**Abstract:**
-> Grapheme-to-phoneme (G2P) conversion serves as an essential component in
-> Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is
-> the core issue. In this paper, we propose an end-to-end framework to predict
-> the pronunciation of a polyphonic character, which accepts sentence containing
-> polyphonic character as input in the form of Chinese character sequence without
-> the necessity of any preprocessing. The proposed method consists of a
-> pre-trained bidirectional encoder representations from Transformers (BERT)
-> model and a neural network (NN) based classifier. The pre-trained BERT model
-> extracts semantic features from a raw Chinese character sequence and the NN
-> based classifier predicts the polyphonic character's pronunciation according to
-> BERT output. In out experiments, we implemented three classifiers, a
-> fully-connected network based classifier, a long short-term memory (LSTM)
-> network based classifier and a Transformer block based classifier. The
-> experimental results compared with the baseline approach based on LSTM
-> demonstrate that, the pre-trained model extracts effective semantic features,
-> which greatly enhances the performance of polyphone disambiguation. In
-> addition, we also explored the impact of contextual information on polyphone
-> disambiguation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on polyphone disambiguation in Chinese text-to-speech (TTS) systems using pre-trained BERT for semantic feature extraction, which does not meet the primary criteria of focusing on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or investigating methods for improving LLM performance through prompt manipulation.
-
----
-
-## [learning discriminative features from spectrograms using center loss for
-  speech emotion recognition](https://arxiv.org/abs/http://arxiv.org/abs/2501.01103v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01103v1
-
-**Abstract:**
-> Identifying the emotional state from speech is essential for the natural
-> interaction of the machine with the speaker. However, extracting effective
-> features for emotion recognition is difficult, as emotions are ambiguous. We
-> propose a novel approach to learn discriminative features from variable length
-> spectrograms for emotion recognition by cooperating softmax cross-entropy loss
-> and center loss together. The softmax cross-entropy loss enables features from
-> different emotion categories separable, and center loss efficiently pulls the
-> features belonging to the same emotion category to their center. By combining
-> the two losses together, the discriminative power will be highly enhanced,
-> which leads to network learning more effective features for emotion
-> recognition. As demonstrated by the experimental results, after introducing
-> center loss, both the unweighted accuracy and weighted accuracy are improved by
-> over 3\% on Mel-spectrogram input, and more than 4\% on Short Time Fourier
-> Transform spectrogram input.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it primarily focuses on speech emotion recognition using spectrograms and novel loss functions for feature learning, with no mention of Large Language Models (LLMs), prompt engineering, or text-based interactions, thereby failing to meet the core subject requirement.
-
----
-
-## [Robust COVID-19 Detection from Cough Sounds using Deep Neural Decision
-  Tree and Forest: A Comprehensive Cross-Datasets Evaluation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01117v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01117v1
-
-**Abstract:**
-> This research presents a robust approach to classifying COVID-19 cough sounds
-> using cutting-edge machine-learning techniques. Leveraging deep neural decision
-> trees and deep neural decision forests, our methodology demonstrates consistent
-> performance across diverse cough sound datasets. We begin with a comprehensive
-> extraction of features to capture a wide range of audio features from
-> individuals, whether COVID-19 positive or negative. To determine the most
-> important features, we use recursive feature elimination along with
-> cross-validation. Bayesian optimization fine-tunes hyper-parameters of deep
-> neural decision tree and deep neural decision forest models. Additionally, we
-> integrate the SMOTE during training to ensure a balanced representation of
-> positive and negative data. Model performance refinement is achieved through
-> threshold optimization, maximizing the ROC-AUC score. Our approach undergoes a
-> comprehensive evaluation in five datasets: Cambridge, Coswara, COUGHVID,
-> Virufy, and the combined Virufy with the NoCoCoDa dataset. Consistently
-> outperforming state-of-the-art methods, our proposed approach yields notable
-> AUC scores of 0.97, 0.98, 0.92, 0.93, 0.99, and 0.99 across the respective
-> datasets. Merging all datasets into a combined dataset, our method, using a
-> deep neural decision forest classifier, achieves an AUC of 0.97. Also, our
-> study includes a comprehensive cross-datasets analysis, revealing demographic
-> and geographic differences in the cough sounds associated with COVID-19. These
-> differences highlight the challenges in transferring learned features across
-> diverse datasets and underscore the potential benefits of dataset integration,
-> improving generalizability and enhancing COVID-19 detection from audio signals.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on COVID-19 detection from cough sounds using deep neural decision trees and forests, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [TED: Turn Emphasis with Dialogue Feature Attention for Emotion
-  Recognition in Conversation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01123v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01123v1
-
-**Abstract:**
-> Emotion recognition in conversation (ERC) has been attracting attention by
-> methods for modeling multi-turn contexts. The multi-turn input to a pretraining
-> model implicitly assumes that the current turn and other turns are
-> distinguished during the training process by inserting special tokens into the
-> input sequence. This paper proposes a priority-based attention method to
-> distinguish each turn explicitly by adding dialogue features into the attention
-> mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each
-> turn according to turn position and speaker information as dialogue features.
-> It takes multi-head self-attention between turn-based vectors for multi-turn
-> input and adjusts attention scores with the dialogue features. We evaluate TED
-> on four typical benchmarks. The experimental results demonstrate that TED has
-> high overall performance in all datasets and achieves state-of-the-art
-> performance on IEMOCAP with numerous turns.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on improving emotion recognition in conversation using a priority-based attention method (TED), not on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not demonstrate the impact of textual input prompts on LLM output."
-}
-
----
-
-## [Missing Data as Augmentation in the Earth Observation Domain: A
-  Multi-View Learning Approach](https://arxiv.org/abs/http://arxiv.org/abs/2501.01132v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01132v1
-
-**Abstract:**
-> Multi-view learning (MVL) leverages multiple sources or views of data to
-> enhance machine learning model performance and robustness. This approach has
-> been successfully used in the Earth Observation (EO) domain, where views have a
-> heterogeneous nature and can be affected by missing data. Despite the negative
-> effect that missing data has on model predictions, the ML literature has used
-> it as an augmentation technique to improve model generalization, like masking
-> the input data. Inspired by this, we introduce novel methods for EO
-> applications tailored to MVL with missing views. Our methods integrate the
-> combination of a set to simulate all combinations of missing views as different
-> training samples. Instead of replacing missing data with a numerical value, we
-> use dynamic merge functions, like average, and more complex ones like
-> Transformer. This allows the MVL model to entirely ignore the missing views,
-> enhancing its predictive robustness. We experiment on four EO datasets with
-> temporal and static views, including state-of-the-art methods from the EO
-> domain. The results indicate that our methods improve model robustness under
-> conditions of moderate missingness, and improve the predictive performance when
-> all views are present. The proposed methods offer a single adaptive solution to
-> operate effectively with any combination of available views.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multi-view learning for Earth Observation (EO) with missing data augmentation, lacking any direct connection to prompt engineering, Large Language Models (LLMs), or text generation, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions](https://arxiv.org/abs/http://arxiv.org/abs/2501.01156v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01156v1
-
-**Abstract:**
-> While generative models such as text-to-image, large language models and
-> text-to-video have seen significant progress, the extension to
-> text-to-virtual-reality remains largely unexplored, due to a deficit in
-> training data and the complexity of achieving realistic depth and motion in
-> virtual environments. This paper proposes an approach to coalesce existing
-> generative systems to form a stereoscopic virtual reality video from text.
-> Carried out in three main stages, we start with a base text-to-image model
-> that captures context from an input text. We then employ Stable Diffusion on
-> the rudimentary image produced, to generate frames with enhanced realism and
-> overall quality. These frames are processed with depth estimation algorithms to
-> create left-eye and right-eye views, which are stitched side-by-side to create
-> an immersive viewing experience. Such systems would be highly beneficial in
-> virtual reality production, since filming and scene building often require
-> extensive hours of work and post-production effort.
-> We utilize image evaluation techniques, specifically Fr\'echet Inception
-> Distance and CLIP Score, to assess the visual quality of frames produced for
-> the video. These quantitative measures establish the proficiency of the
-> proposed method.
-> Our work highlights the exciting possibilities of using natural
-> language-driven graphics in fields like virtual reality simulations.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on generating stereoscopic VR video clips from text descriptions, which falls under image/video generation rather than text generation driven by Large Language Models (LLMs), and does not meet the criteria of investigating prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Harnessing Multi-Agent LLMs for Complex Engineering Problem-Solving: A
-  Framework for Senior Design Projects](https://arxiv.org/abs/http://arxiv.org/abs/2501.01205v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01205v1
-
-**Abstract:**
-> Multi-Agent Large Language Models (LLMs) are gaining significant attention
-> for their ability to harness collective intelligence in complex
-> problem-solving, decision-making, and planning tasks. This aligns with the
-> concept of the wisdom of crowds, where diverse agents contribute collectively
-> to generating effective solutions, making it particularly suitable for
-> educational settings. Senior design projects, also known as capstone or final
-> year projects, are pivotal in engineering education as they integrate
-> theoretical knowledge with practical application, fostering critical thinking,
-> teamwork, and real-world problem-solving skills. In this paper, we explore the
-> use of Multi-Agent LLMs in supporting these senior design projects undertaken
-> by engineering students, which often involve multidisciplinary considerations
-> and conflicting objectives, such as optimizing technical performance while
-> addressing ethical, social, and environmental concerns. We propose a framework
-> where distinct LLM agents represent different expert perspectives, such as
-> problem formulation agents, system complexity agents, societal and ethical
-> agents, or project managers, thus facilitating a holistic problem-solving
-> approach. This implementation leverages standard multi-agent system (MAS)
-> concepts such as coordination, cooperation, and negotiation, incorporating
-> prompt engineering to develop diverse personas for each agent. These agents
-> engage in rich, collaborative dialogues to simulate human engineering teams,
-> guided by principles from swarm AI to efficiently balance individual
-> contributions towards a unified solution. We adapt these techniques to create a
-> collaboration structure for LLM agents, encouraging interdisciplinary reasoning
-> and negotiation similar to real-world senior design projects. To assess the
-> efficacy of this framework, we collected six proposals of engineering and
-> computer science of...
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on utilizing Multi-Agent LLMs for complex engineering problem-solving in educational settings, rather than specifically on the engineering, design, or optimization of prompts for Large Language Models, as evidenced by its emphasis on a framework for senior design projects and multi-agent system concepts.
-
----
-
-## [Face-Human-Bench: A Comprehensive Benchmark of Face and Human
-  Understanding for Multi-modal Assistants](https://arxiv.org/abs/http://arxiv.org/abs/2501.01243v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01243v2
-
-**Abstract:**
-> Faces and humans are crucial elements in social interaction and are widely
-> included in everyday photos and videos. Therefore, a deep understanding of
-> faces and humans will enable multi-modal assistants to achieve improved
-> response quality and broadened application scope. Currently, the multi-modal
-> assistant community lacks a comprehensive and scientific evaluation of face and
-> human understanding abilities. In this paper, we first propose a hierarchical
-> ability taxonomy that includes three levels of abilities. Then, based on this
-> taxonomy, we collect images and annotations from publicly available datasets in
-> the face and human community and build a semi-automatic data pipeline to
-> produce problems for the new benchmark. Finally, the obtained Face-Human-Bench
-> comprises a development set with 900 problems and a test set with 1800
-> problems, supporting both English and Chinese. We conduct evaluations over 25
-> mainstream multi-modal large language models (MLLMs) with our Face-Human-Bench,
-> focusing on the correlation between abilities, the impact of the relative
-> position of targets on performance, and the impact of Chain of Thought (CoT)
-> prompting on performance. Moreover, inspired by multi-modal agents, we also
-> explore which abilities of MLLMs need to be supplemented by specialist models.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on evaluating face and human understanding abilities for multi-modal assistants, not specifically on engineering, design, or optimization of prompts for Large Language Models (LLMs). While it mentions evaluating the impact of Chain of Thought (CoT) prompting, this is not the central concern, but rather one aspect of a broader evaluation."
-}
-
----
-
-## [ProgCo: Program Helps Self-Correction of Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01264v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01264v1
-
-**Abstract:**
-> Self-Correction aims to enable large language models (LLMs) to self-verify
-> and self-refine their initial responses without external feedback. However,
-> LLMs often fail to effectively self-verify and generate correct feedback,
-> further misleading refinement and leading to the failure of self-correction,
-> especially in complex reasoning tasks. In this paper, we propose Program-driven
-> Self-Correction (ProgCo). First, program-driven verification (ProgVe) achieves
-> complex verification logic and extensive validation through self-generated,
-> self-executing verification pseudo-programs. Then, program-driven refinement
-> (ProgRe) receives feedback from ProgVe, conducts dual reflection and refinement
-> on both responses and verification programs to mitigate misleading of incorrect
-> feedback in complex reasoning tasks. Experiments on three instruction-following
-> and mathematical benchmarks indicate that ProgCo achieves effective
-> self-correction, and can be further enhance performance when combined with real
-> program tools.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a self-correction methodology for Large Language Models (LLMs) using program-driven approaches, rather than specifically engineering, designing, or optimizing textual input prompts to improve LLM performance, not meeting the core 'MUST' criteria.
-
----
-
-## [CultureVLM: Characterizing and Improving Cultural Understanding of
-  Vision-Language Models for over 100 Countries](https://arxiv.org/abs/http://arxiv.org/abs/2501.01282v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01282v1
-
-**Abstract:**
-> Vision-language models (VLMs) have advanced human-AI interaction but struggle
-> with cultural understanding, often misinterpreting symbols, gestures, and
-> artifacts due to biases in predominantly Western-centric training data. In this
-> paper, we construct CultureVerse, a large-scale multimodal benchmark covering
-> 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3
-> question types, with the aim of characterizing and improving VLMs'
-> multicultural understanding capabilities. Then, we propose CultureVLM, a series
-> of VLMs fine-tuned on our dataset to achieve significant performance
-> improvement in cultural understanding. Our evaluation of 16 models reveals
-> significant disparities, with a stronger performance in Western concepts and
-> weaker results in African and Asian contexts. Fine-tuning on our CultureVerse
-> enhances cultural perception, demonstrating cross-cultural, cross-continent,
-> and cross-dataset generalization without sacrificing performance on models'
-> general VLM benchmarks. We further present insights on cultural generalization
-> and forgetting. We hope that this work could lay the foundation for more
-> equitable and culturally aware multimodal AI systems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on fine-tuning Vision-Language Models (VLMs) for improved cultural understanding, which does not meet the 'MUST' criteria of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs). Additionally, the paper's core subject is VLMs and cultural understanding, not prompt engineering for text-based interactions with LLMs.
-
----
-
-## [LEO-Split: A Semi-Supervised Split Learning Framework over LEO Satellite
-  Networks](https://arxiv.org/abs/http://arxiv.org/abs/2501.01293v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01293v1
-
-**Abstract:**
-> Recently, the increasing deployment of LEO satellite systems has enabled
-> various space analytics (e.g., crop and climate monitoring), which heavily
-> relies on the advancements in deep learning (DL). However, the intermittent
-> connectivity between LEO satellites and ground station (GS) significantly
-> hinders the timely transmission of raw data to GS for centralized learning,
-> while the scaled-up DL models hamper distributed learning on
-> resource-constrained LEO satellites. Though split learning (SL) can be a
-> potential solution to these problems by partitioning a model and offloading
-> primary training workload to GS, the labor-intensive labeling process remains
-> an obstacle, with intermittent connectivity and data heterogeneity being other
-> challenges. In this paper, we propose LEO-Split, a semi-supervised (SS) SL
-> design tailored for satellite networks to combat these challenges. Leveraging
-> SS learning to handle (labeled) data scarcity, we construct an auxiliary model
-> to tackle the training failure of the satellite-GS non-contact time. Moreover,
-> we propose a pseudo-labeling algorithm to rectify data imbalances across
-> satellites. Lastly, an adaptive activation interpolation scheme is devised to
-> prevent the overfitting of server-side sub-model training at GS. Extensive
-> experiments with real-world LEO satellite traces (e.g., Starlink) demonstrate
-> that our LEO-Split framework achieves superior performance compared to
-> state-ofthe-art benchmarks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a semi-supervised split learning framework for LEO satellite networks, with an emphasis on deep learning (DL) model optimization and data transmission challenges, rather than engineering or optimizing prompts for Large Language Models (LLMs).
-
----
-
-## [CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for
-  Benchmarking Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01335v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01335v1
-
-**Abstract:**
-> Numerous studies have investigated methods for jailbreaking Large Language
-> Models (LLMs) to generate harmful content. Typically, these methods are
-> evaluated using datasets of malicious prompts designed to bypass security
-> policies established by LLM providers. However, the generally broad scope and
-> open-ended nature of existing datasets can complicate the assessment of
-> jailbreaking effectiveness, particularly in specific domains, notably
-> cybersecurity. To address this issue, we present and publicly release
-> CySecBench, a comprehensive dataset containing 12662 prompts specifically
-> designed to evaluate jailbreaking techniques in the cybersecurity domain. The
-> dataset is organized into 10 distinct attack-type categories, featuring
-> close-ended prompts to enable a more consistent and accurate assessment of
-> jailbreaking attempts. Furthermore, we detail our methodology for dataset
-> generation and filtration, which can be adapted to create similar datasets in
-> other domains. To demonstrate the utility of CySecBench, we propose and
-> evaluate a jailbreaking approach based on prompt obfuscation. Our experimental
-> results show that this method successfully elicits harmful content from
-> commercial black-box LLMs, achieving Success Rates (SRs) of 65% with ChatGPT
-> and 88% with Gemini; in contrast, Claude demonstrated greater resilience with a
-> jailbreaking SR of 17%. Compared to existing benchmark approaches, our method
-> shows superior performance, highlighting the value of domain-specific
-> evaluation datasets for assessing LLM security measures. Moreover, when
-> evaluated using prompts from a widely used dataset (i.e., AdvBench), it
-> achieved an SR of 78.5%, higher than the state-of-the-art methods.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on evaluating and bypassing Large Language Models' security measures in the cybersecurity domain, rather than on the engineering, design, or optimization of prompts specifically for improving LLM performance through textual input manipulation."
-}
-
----
-
-## [ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding](https://arxiv.org/abs/http://arxiv.org/abs/2501.01366v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01366v1
-
-**Abstract:**
-> 3D visual grounding (3DVG) involves localizing entities in a 3D scene
-> referred to by natural language text. Such models are useful for embodied AI
-> and scene retrieval applications, which involve searching for objects or
-> patterns using natural language descriptions. While recent works have focused
-> on LLM-based scaling of 3DVG datasets, these datasets do not capture the full
-> range of potential prompts which could be specified in the English language. To
-> ensure that we are scaling up and testing against a useful and representative
-> set of prompts, we propose a framework for linguistically analyzing 3DVG
-> prompts and introduce Visual Grounding with Diverse Language in 3D (ViGiL3D), a
-> diagnostic dataset for evaluating visual grounding methods against a diverse
-> set of language patterns. We evaluate existing open-vocabulary 3DVG methods to
-> demonstrate that these methods are not yet proficient in understanding and
-> identifying the targets of more challenging, out-of-distribution prompts,
-> toward real-world applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-Although the paper uses LLMs and discusses the importance of diverse language prompts for 3D visual grounding, its primary focus is on creating a diagnostic dataset for evaluating visual grounding methods, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) to improve their text generation performance.
-
----
-
-## [Contrastive Learning from Exploratory Actions: Leveraging Natural
-  Interactions for Preference Elicitation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01367v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01367v1
-
-**Abstract:**
-> People have a variety of preferences for how robots behave. To understand and
-> reason about these preferences, robots aim to learn a reward function that
-> describes how aligned robot behaviors are with a user's preferences. Good
-> representations of a robot's behavior can significantly reduce the time and
-> effort required for a user to teach the robot their preferences. Specifying
-> these representations -- what "features" of the robot's behavior matter to
-> users -- remains a difficult problem; Features learned from raw data lack
-> semantic meaning and features learned from user data require users to engage in
-> tedious labeling processes. Our key insight is that users tasked with
-> customizing a robot are intrinsically motivated to produce labels through
-> exploratory search; they explore behaviors that they find interesting and
-> ignore behaviors that are irrelevant. To harness this novel data source of
-> exploratory actions, we propose contrastive learning from exploratory actions
-> (CLEA) to learn trajectory features that are aligned with features that users
-> care about. We learned CLEA features from exploratory actions users performed
-> in an open-ended signal design activity (N=25) with a Kuri robot, and evaluated
-> CLEA features through a second user study with a different set of users (N=42).
-> CLEA features outperformed self-supervised features when eliciting user
-> preferences over four metrics: completeness, simplicity, minimality, and
-> explainability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on learning features for preference elicitation in human-robot interaction, specifically leveraging user exploratory actions, and does not primarily investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through the manipulation of textual input prompts.
-
----
-
-## [ScarNet: A Novel Foundation Model for Automated Myocardial Scar
-  Quantification from LGE in Cardiac MRI](https://arxiv.org/abs/http://arxiv.org/abs/2501.01372v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01372v1
-
-**Abstract:**
-> Background: Late Gadolinium Enhancement (LGE) imaging is the gold standard
-> for assessing myocardial fibrosis and scarring, with left ventricular (LV) LGE
-> extent predicting major adverse cardiac events (MACE). Despite its importance,
-> routine LGE-based LV scar quantification is hindered by labor-intensive manual
-> segmentation and inter-observer variability. Methods: We propose ScarNet, a
-> hybrid model combining a transformer-based encoder from the Medical Segment
-> Anything Model (MedSAM) with a convolution-based U-Net decoder, enhanced by
-> tailored attention blocks. ScarNet was trained on 552 ischemic cardiomyopathy
-> patients with expert segmentations of myocardial and scar boundaries and tested
-> on 184 separate patients. Results: ScarNet achieved robust scar segmentation in
-> 184 test patients, yielding a median Dice score of 0.912 (IQR: 0.863--0.944),
-> significantly outperforming MedSAM (median Dice = 0.046, IQR: 0.043--0.047) and
-> nnU-Net (median Dice = 0.638, IQR: 0.604--0.661). ScarNet demonstrated lower
-> bias (-0.63%) and coefficient of variation (4.3%) compared to MedSAM (bias:
-> -13.31%, CoV: 130.3%) and nnU-Net (bias: -2.46%, CoV: 20.3%). In Monte Carlo
-> simulations with noise perturbations, ScarNet achieved significantly higher
-> scar Dice (0.892 \pm 0.053, CoV = 5.9%) than MedSAM (0.048 \pm 0.112, CoV =
-> 233.3%) and nnU-Net (0.615 \pm 0.537, CoV = 28.7%). Conclusion: ScarNet
-> outperformed MedSAM and nnU-Net in accurately segmenting myocardial and scar
-> boundaries in LGE images. The model exhibited robust performance across diverse
-> image qualities and scar patterns.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a novel foundation model (ScarNet) for medical image analysis, rather than engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not investigate or propose methods for improving LLM performance through textual input prompts.
-
----
-
-## [Training Medical Large Vision-Language Models with Abnormal-Aware
-  Feedback](https://arxiv.org/abs/http://arxiv.org/abs/2501.01377v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01377v1
-
-**Abstract:**
-> Existing Medical Large Vision-Language Models (Med-LVLMs), which encapsulate
-> extensive medical knowledge, demonstrate excellent capabilities in
-> understanding medical images and responding to human queries based on these
-> images. However, there remain challenges in visual localization in medical
-> images, which is crucial for abnormality detection and interpretation. To
-> address these issues, we propose a novel UMed-LVLM designed with Unveiling
-> Medical abnormalities. Specifically, we collect a Medical Abnormalities
-> Unveiling (MAU) dataset and propose a two-stage training method for UMed-LVLM
-> training. To collect MAU dataset, we propose a prompt method utilizing the
-> GPT-4V to generate diagnoses based on identified abnormal areas in medical
-> images. Moreover, the two-stage training method includes Abnormal-Aware
-> Instruction Tuning and Abnormal-Aware Rewarding, comprising Abnormal
-> Localization Rewarding and Vision Relevance Rewarding. Experimental results
-> demonstrate that our UMed-LVLM surpasses existing Med-LVLMs in identifying and
-> understanding medical abnormality. In addition, this work shows that enhancing
-> the abnormality detection capabilities of Med-LVLMs significantly improves
-> their understanding of medical images and generalization capability.
-
-**Decision Explanation:**
-Original decision: REJECT
-This paper primarily focuses on the development of Medical Large Vision-Language Models (Med-LVLMs) for abnormality detection in medical images, which falls under excluded categories (medical subjects and non-text generation driven by LLMs), and does not centrally focus on prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Multi-Modal Video Feature Extraction for Popularity Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.01422v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01422v1
-
-**Abstract:**
-> This work aims to predict the popularity of short videos using the videos
-> themselves and their related features. Popularity is measured by four key
-> engagement metrics: view count, like count, comment count, and share count.
-> This study employs video classification models with different architectures and
-> training methods as backbone networks to extract video modality features.
-> Meanwhile, the cleaned video captions are incorporated into a carefully
-> designed prompt framework, along with the video, as input for video-to-text
-> generation models, which generate detailed text-based video content
-> understanding. These texts are then encoded into vectors using a pre-trained
-> BERT model. Based on the six sets of vectors mentioned above, a neural network
-> is trained for each of the four prediction metrics. Moreover, the study
-> conducts data mining and feature engineering based on the video and tabular
-> data, constructing practical features such as the total frequency of hashtag
-> appearances, the total frequency of mention appearances, video duration, frame
-> count, frame rate, and total time online. Multiple machine learning models are
-> trained, and the most stable model, XGBoost, is selected. Finally, the
-> predictions from the neural network and XGBoost models are averaged to obtain
-> the final result.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on multi-modal video feature extraction for popularity prediction, with LLMs used only as a component (video-to-text generation) and not specifically for text-based interactions with prompt engineering as the central concern, thus failing to meet the 'MUST' criteria.
-
----
-
-## [Object-level Visual Prompts for Compositional Image Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01424v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01424v1
-
-**Abstract:**
-> We introduce a method for composing object-level visual prompts within a
-> text-to-image diffusion model. Our approach addresses the task of generating
-> semantically coherent compositions across diverse scenes and styles, similar to
-> the versatility and expressiveness offered by text prompts. A key challenge in
-> this task is to preserve the identity of the objects depicted in the input
-> visual prompts, while also generating diverse compositions across different
-> images. To address this challenge, we introduce a new KV-mixed cross-attention
-> mechanism, in which keys and values are learned from distinct visual
-> representations. The keys are derived from an encoder with a small bottleneck
-> for layout control, whereas the values come from a larger bottleneck encoder
-> that captures fine-grained appearance details. By mixing keys and values from
-> these complementary sources, our model preserves the identity of the visual
-> prompts while supporting flexible variations in object arrangement, pose, and
-> composition. During inference, we further propose object-level compositional
-> guidance to improve the method's identity preservation and layout correctness.
-> Results show that our technique produces diverse scene compositions that
-> preserve the unique characteristics of each visual prompt, expanding the
-> creative potential of text-to-image generation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on text-to-image generation with visual prompts, not large language models (LLMs) or text-based interactions, violating the 'MUST NOT' criteria for applications of generative AI other than text generation driven by LLMs.
-
----
-
-## [Enhancing Reasoning through Process Supervision with Monte Carlo Tree
-  Search](https://arxiv.org/abs/http://arxiv.org/abs/2501.01478v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01478v1
-
-**Abstract:**
-> Large language models (LLMs) have demonstrated their remarkable capacity
-> across a variety of tasks. However, reasoning remains a challenge for LLMs. To
-> improve LLMs' reasoning ability, process supervision has proven to be better
-> than outcome supervision. In this work, we study using Monte Carlo Tree Search
-> (MCTS) to generate process supervision data with LLMs themselves for training
-> them. We sample reasoning steps with an LLM and assign each step a score that
-> captures its "relative correctness," and the LLM is then trained by minimizing
-> weighted log-likelihood of generating the reasoning steps. This
-> generate-then-train process is repeated iteratively until convergence.Our
-> experimental results demonstrate that the proposed methods considerably improve
-> the performance of LLMs on two mathematical reasoning datasets. Furthermore,
-> models trained on one dataset also exhibit improved performance on the other,
-> showing the transferability of the enhanced reasoning ability.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on improving LLM's reasoning ability through process supervision with Monte Carlo Tree Search, which is a training method enhancement rather than a prompt engineering technique for text-based interactions with LLMs, violating MUST NOT criterion 1."
-}
-
----
-
-## [ORACLE: A Real-Time, Hierarchical, Deep-Learning Photometric Classifier
-  for the LSST](https://arxiv.org/abs/http://arxiv.org/abs/2501.01496v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01496v1
-
-**Abstract:**
-> We present ORACLE, the first hierarchical deep-learning model for real-time,
-> context-aware classification of transient and variable astrophysical phenomena.
-> ORACLE is a recurrent neural network with Gated Recurrent Units (GRUs), and has
-> been trained using a custom hierarchical cross-entropy loss function to provide
-> high-confidence classifications along an observationally-driven taxonomy with
-> as little as a single photometric observation. Contextual information for each
-> object, including host galaxy photometric redshift, offset, ellipticity and
-> brightness, is concatenated to the light curve embedding and used to make a
-> final prediction. Training on $\sim$0.5M events from the Extended LSST
-> Astronomical Time-Series Classification Challenge, we achieve a top-level
-> (Transient vs Variable) macro-averaged precision of 0.96 using only 1 day of
-> photometric observations after the first detection in addition to contextual
-> information, for each event; this increases to $>$0.99 once 64 days of the
-> light curve has been obtained, and 0.83 at 1024 days after first detection for
-> 19-way classification (including supernova sub-types, active galactic nuclei,
-> variable stars, microlensing events, and kilonovae). We also compare ORACLE
-> with other state-of-the-art classifiers and report comparable performance for
-> the 19-way classification task, in addition to delivering accurate top-level
-> classifications much earlier. The code and model weights used in this work are
-> publicly available at our associated GitHub repository
-> (https://github.com/uiucsn/ELAsTiCC-Classification).
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet any of the 'MUST' criteria, as it focuses on developing a deep-learning model for astrophysical phenomena classification, not on prompt engineering for Large Language Models (LLMs), and doesn't investigate, analyze, or propose methods for improving LLM performance through textual input prompts.
-
----
-
-## [Transfer Learning Analysis of Variational Quantum Circuits](https://arxiv.org/abs/http://arxiv.org/abs/2501.01507v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01507v1
-
-**Abstract:**
-> This work analyzes transfer learning of the Variational Quantum Circuit
-> (VQC). Our framework begins with a pretrained VQC configured in one domain and
-> calculates the transition of 1-parameter unitary subgroups required for a new
-> domain. A formalism is established to investigate the adaptability and
-> capability of a VQC under the analysis of loss bounds. Our theory observes
-> knowledge transfer in VQCs and provides a heuristic interpretation for the
-> mechanism. An analytical fine-tuning method is derived to attain the optimal
-> transition for adaptations of similar domains.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on transfer learning and optimization of Variational Quantum Circuits (VQCs), with no apparent connection to Large Language Models (LLMs) or prompt engineering for text-based interactions, failing to meet the primary 'MUST' criteria.
-
----
-
-## [AI-Enabled Operations at Fermi Complex: Multivariate Time Series
-  Prediction for Outage Prediction and Diagnosis](https://arxiv.org/abs/http://arxiv.org/abs/2501.01509v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01509v1
-
-**Abstract:**
-> The Main Control Room of the Fermilab accelerator complex continuously
-> gathers extensive time-series data from thousands of sensors monitoring the
-> beam. However, unplanned events such as trips or voltage fluctuations often
-> result in beam outages, causing operational downtime. This downtime not only
-> consumes operator effort in diagnosing and addressing the issue but also leads
-> to unnecessary energy consumption by idle machines awaiting beam restoration.
-> The current threshold-based alarm system is reactive and faces challenges
-> including frequent false alarms and inconsistent outage-cause labeling. To
-> address these limitations, we propose an AI-enabled framework that leverages
-> predictive analytics and automated labeling. Using data from $2,703$ Linac
-> devices and $80$ operator-labeled outages, we evaluate state-of-the-art deep
-> learning architectures, including recurrent, attention-based, and linear
-> models, for beam outage prediction. Additionally, we assess a Random
-> Forest-based labeling system for providing consistent, confidence-scored outage
-> annotations. Our findings highlight the strengths and weaknesses of these
-> architectures for beam outage prediction and identify critical gaps that must
-> be addressed to fully harness AI for transitioning downtime handling from
-> reactive to predictive, ultimately reducing downtime and improving
-> decision-making in accelerator management.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on multivariate time series prediction for outage prediction and diagnosis in an accelerator complex using deep learning architectures, with no primary focus on prompt engineering for Large Language Models (LLMs) or manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [DiagrammaticLearning: A Graphical Language for Compositional Training
-  Regimes](https://arxiv.org/abs/http://arxiv.org/abs/2501.01515v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01515v1
-
-**Abstract:**
-> Motivated by deep learning regimes with multiple interacting yet distinct
-> model components, we introduce learning diagrams, graphical depictions of
-> training setups that capture parameterized learning as data rather than code. A
-> learning diagram compiles to a unique loss function on which component models
-> are trained. The result of training on this loss is a collection of models
-> whose predictions ``agree" with one another. We show that a number of popular
-> learning setups such as few-shot multi-task learning, knowledge distillation,
-> and multi-modal learning can be depicted as learning diagrams. We further
-> implement learning diagrams in a library that allows users to build diagrams of
-> PyTorch and Flux.jl models. By implementing some classic machine learning use
-> cases, we demonstrate how learning diagrams allow practitioners to build
-> complicated models as compositions of smaller components, identify
-> relationships between workflows, and manipulate models during or after
-> training. Leveraging a category theoretic framework, we introduce a rigorous
-> semantics for learning diagrams that puts such operations on a firm
-> mathematical foundation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a graphical language for compositional training regimes and a library for building complex models, which does not meet the primary criterion of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Improving Robustness Estimates in Natural Language Explainable AI though
-  Synonymity Weighted Similarity Measures](https://arxiv.org/abs/http://arxiv.org/abs/2501.01516v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01516v1
-
-**Abstract:**
-> Explainable AI (XAI) has seen a surge in recent interest with the
-> proliferation of powerful but intractable black-box models. Moreover, XAI has
-> come under fire for techniques that may not offer reliable explanations. As
-> many of the methods in XAI are themselves models, adversarial examples have
-> been prominent in the literature surrounding the effectiveness of XAI, with the
-> objective of these examples being to alter the explanation while maintaining
-> the output of the original model. For explanations in natural language, it is
-> natural to use measures found in the domain of information retrieval for use
-> with ranked lists to guide the adversarial XAI process. We show that the
-> standard implementation of these measures are poorly suited for the comparison
-> of explanations in adversarial XAI and amend them by using information that is
-> discarded, the synonymity of perturbed words. This synonymity weighting
-> produces more accurate estimates of the actual weakness of XAI methods to
-> adversarial examples.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on improving robustness estimates in Explainable AI (XAI) using synonymity weighted similarity measures, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), thus not meeting the core subject requirement."
-}
-
----
-
-## [A Metasemantic-Metapragmatic Framework for Taxonomizing Multimodal
-  Communicative Alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.01535v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01535v1
-
-**Abstract:**
-> Drawing on contemporary pragmatist philosophy and linguistic theories on
-> cognition, meaning, and communication, this paper presents a dynamic,
-> metasemantic-metapragmatic taxonomy for grounding and conceptualizing
-> human-like multimodal communicative alignment. The framework is rooted in
-> contemporary developments of the three basic communicative capacities initially
-> identified by American logician and pragmatist philosopher Charles Sanders
-> Peirce: iconic (sensory and perceptual qualities), indexical (contextual and
-> sociocultural associations), and rule-like (symbolic and intuitive reasoning).
-> Expanding on these developments, I introduce the concept of indexical
-> contextualization and propose the principle of "contextualization
-> directionality" for characterizing the crucial metapragmatic capacity for
-> maintaining, navigating, or transitioning between semantic and pragmatic modes
-> of multimodal communication. I contend that current cognitive-social
-> computational and engineering methodologies disproportionately emphasize the
-> semantic/metasemantic domain, overlooking the pivotal role of metapragmatic
-> indexicality in traversing the semantic-pragmatic spectrum of communication.
-> The framework's broader implications for intentionality, identity, affect, and
-> ethics in within-modal and cross-modal human-machine alignment are also
-> discussed.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria, as it primarily focuses on a philosophical framework for multimodal communicative alignment, with no clear emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [In Search of a Lost Metric: Human Empowerment as a Pillar of Socially
-  Conscious Navigation](https://arxiv.org/abs/http://arxiv.org/abs/2501.01539v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01539v1
-
-**Abstract:**
-> In social robot navigation, traditional metrics like proxemics and behavior
-> naturalness emphasize human comfort and adherence to social norms but often
-> fail to capture an agent's autonomy and adaptability in dynamic environments.
-> This paper introduces human empowerment, an information-theoretic concept that
-> measures a human's ability to influence their future states and observe those
-> changes, as a complementary metric for evaluating social compliance. This
-> metric reveals how robot navigation policies can indirectly impact human
-> empowerment. We present a framework that integrates human empowerment into the
-> evaluation of social performance in navigation tasks. Through numerical
-> simulations, we demonstrate that human empowerment as a metric not only aligns
-> with intuitive social behavior, but also shows statistically significant
-> differences across various robot navigation policies. These results provide a
-> deeper understanding of how different policies affect social compliance,
-> highlighting the potential of human empowerment as a complementary metric for
-> future research in social navigation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on social robot navigation, introducing 'human empowerment' as a metric for evaluating social compliance, and does not meet the 'MUST' criteria for primarily focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or investigating methods to improve LLM performance through textual input prompt manipulation.
-
----
-
-## [BLAST: A Stealthy Backdoor Leverage Attack against Cooperative
-  Multi-Agent Deep Reinforcement Learning based Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.01593v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01593v1
-
-**Abstract:**
-> Recent studies have shown that cooperative multi-agent deep reinforcement
-> learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor
-> trigger is observed, it will perform malicious actions leading to failures or
-> malicious goals. However, existing backdoor attacks suffer from several issues,
-> e.g., instant trigger patterns lack stealthiness, the backdoor is trained or
-> activated by an additional network, or all agents are backdoored. To this end,
-> in this paper, we propose a novel backdoor leverage attack against c-MADRL,
-> BLAST, which attacks the entire multi-agent team by embedding the backdoor only
-> in a single agent. Firstly, we introduce adversary spatiotemporal behavior
-> patterns as the backdoor trigger rather than manual-injected fixed visual
-> patterns or instant status and control the period to perform malicious actions.
-> This method can guarantee the stealthiness and practicality of BLAST. Secondly,
-> we hack the original reward function of the backdoor agent via unilateral
-> guidance to inject BLAST, so as to achieve the \textit{leverage attack effect}
-> that can pry open the entire multi-agent system via a single backdoor agent. We
-> evaluate our BLAST against 3 classic c-MADRL algorithms (VDN, QMIX, and MAPPO)
-> in 2 popular c-MADRL environments (SMAC and Pursuit), and 2 existing defense
-> mechanisms. The experimental results demonstrate that BLAST can achieve a high
-> attack success rate while maintaining a low clean performance variance rate.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on a backdoor leverage attack against cooperative multi-agent deep reinforcement learning systems, with no emphasis on Large Language Models (LLMs), prompt engineering, or textual input manipulation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of
-  Psychiatric Assessment Conversational Agents](https://arxiv.org/abs/http://arxiv.org/abs/2501.01594v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01594v1
-
-**Abstract:**
-> Recent advances in large language models (LLMs) have accelerated the
-> development of conversational agents capable of generating human-like
-> responses. Since psychiatric assessments typically involve complex
-> conversational interactions between psychiatrists and patients, there is
-> growing interest in developing LLM-based psychiatric assessment conversational
-> agents (PACAs) that aim to simulate the role of psychiatrists in clinical
-> evaluations. However, standardized methods for benchmarking the clinical
-> appropriateness of PACAs' interaction with patients still remain underexplored.
-> Here, we propose PSYCHE, a novel framework designed to enable the 1) clinically
-> relevant, 2) ethically safe, 3) cost-efficient, and 4) quantitative evaluation
-> of PACAs. This is achieved by simulating psychiatric patients based on a
-> multi-faceted psychiatric construct that defines the simulated patients'
-> profiles, histories, and behaviors, which PACAs are expected to assess. We
-> validate the effectiveness of PSYCHE through a study with 10 board-certified
-> psychiatrists, supported by an in-depth analysis of the simulated patient
-> utterances.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a framework for evaluating Psychiatric Assessment Conversational Agents (PACAs), with LLMs being a component, rather than focusing on prompt engineering for text-based interactions with LLMs, not meeting the core subject requirement.
-
----
-
-## [A non-ergodic framework for understanding emergent capabilities in Large
-  Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01638v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01638v1
-
-**Abstract:**
-> Large language models have emergent capabilities that come unexpectedly at
-> scale, but we need a theoretical framework to explain why and how they emerge.
-> We prove that language models are actually non-ergodic systems while providing
-> a mathematical framework based on Stuart Kauffman's theory of the adjacent
-> possible (TAP) to explain capability emergence. Our resource-constrained TAP
-> equation demonstrates how architectural, training, and contextual constraints
-> interact to shape model capabilities through phase transitions in semantic
-> space. We prove through experiments with three different language models that
-> capacities emerge through discrete transitions guided by constraint
-> interactions and path-dependent exploration. This framework provides a
-> theoretical basis for understanding emergence in language models and guides the
-> development of architectures that can guide capability emergence.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on developing a theoretical framework to understand emergent capabilities in LLMs, discussing architectural, training, and contextual constraints, but does not investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts."
-}
-
----
-
-## [AgentRefine: Enhancing Agent Generalization through Refinement Tuning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01702v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01702v1
-
-**Abstract:**
-> Large Language Model (LLM) based agents have proved their ability to perform
-> complex tasks like humans. However, there is still a large gap between
-> open-sourced LLMs and commercial models like the GPT series. In this paper, we
-> focus on improving the agent generalization capabilities of LLMs via
-> instruction tuning. We first observe that the existing agent training corpus
-> exhibits satisfactory results on held-in evaluation sets but fails to
-> generalize to held-out sets. These agent-tuning works face severe formatting
-> errors and are frequently stuck in the same mistake for a long while. We
-> analyze that the poor generalization ability comes from overfitting to several
-> manual agent environments and a lack of adaptation to new situations. They
-> struggle with the wrong action steps and can not learn from the experience but
-> just memorize existing observation-action relations. Inspired by the insight,
-> we propose a novel AgentRefine framework for agent-tuning. The core idea is to
-> enable the model to learn to correct its mistakes via observation in the
-> trajectory. Specifically, we propose an agent synthesis framework to encompass
-> a diverse array of environments and tasks and prompt a strong LLM to refine its
-> error action according to the environment feedback. AgentRefine significantly
-> outperforms state-of-the-art agent-tuning work in terms of generalization
-> ability on diverse agent tasks. It also has better robustness facing
-> perturbation and can generate diversified thought in inference. Our findings
-> establish the correlation between agent generalization and self-refinement and
-> provide a new paradigm for future research.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on refining Large Language Model (LLM) based agents through instruction tuning to improve generalization capabilities, which aligns more with the development of new training methods for LLMs (excluded by MUST NOT 1) rather than primarily focusing on prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Combined Hyper-Extensible Extremely-Secured Zero-Trust CIAM-PAM
-  architecture](https://arxiv.org/abs/http://arxiv.org/abs/2501.01732v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01732v1
-
-**Abstract:**
-> Customer Identity and Access Management (CIAM) systems play a pivotal role in
-> securing enterprise infrastructures. However, the complexity of implementing
-> these systems requires careful architectural planning to ensure positive Return
-> on Investment (RoI) and avoid costly delays. The proliferation of Active
-> Persistent cyber threats, coupled with advancements in AI, cloud computing, and
-> geographically distributed customer populations, necessitates a paradigm shift
-> towards adaptive and zero-trust security frameworks. This paper introduces the
-> Combined Hyper-Extensible Extremely-Secured Zero-Trust (CHEZ) CIAM-PAM
-> architecture, designed specifically for large-scale enterprises. The CHEZ PL
-> CIAM-PAM framework addresses critical security gaps by integrating federated
-> identity management (private and public identities), password-less
-> authentication, adaptive multi-factor authentication (MFA), microservice-based
-> PEP (Policy Entitlement Point), multi-layer RBAC (Role Based Access Control)
-> and multi-level trust systems. This future-proof design also includes
-> end-to-end data encryption, and seamless integration with state-of-the-art
-> AI-based threat detection systems, while ensuring compliance with stringent
-> regulatory standards.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a secure CIAM-PAM architecture for enterprise infrastructures, with no apparent focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), thus failing to meet the core subject requirement.
-
----
-
-## [How Toxic Can You Get? Search-based Toxicity Testing for Large Language
-  Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01741v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01741v1
-
-**Abstract:**
-> Language is a deep-rooted means of perpetration of stereotypes and
-> discrimination. Large Language Models (LLMs), now a pervasive technology in our
-> everyday lives, can cause extensive harm when prone to generating toxic
-> responses. The standard way to address this issue is to align the LLM, which,
-> however, dampens the issue without constituting a definitive solution.
-> Therefore, testing LLM even after alignment efforts remains crucial for
-> detecting any residual deviations with respect to ethical standards. We present
-> EvoTox, an automated testing framework for LLMs' inclination to toxicity,
-> providing a way to quantitatively assess how much LLMs can be pushed towards
-> toxic responses even in the presence of alignment. The framework adopts an
-> iterative evolution strategy that exploits the interplay between two LLMs, the
-> System Under Test (SUT) and the Prompt Generator steering SUT responses toward
-> higher toxicity. The toxicity level is assessed by an automated oracle based on
-> an existing toxicity classifier. We conduct a quantitative and qualitative
-> empirical evaluation using four state-of-the-art LLMs as evaluation subjects
-> having increasing complexity (7-13 billion parameters). Our quantitative
-> evaluation assesses the cost-effectiveness of four alternative versions of
-> EvoTox against existing baseline methods, based on random search, curated
-> datasets of toxic prompts, and adversarial attacks. Our qualitative assessment
-> engages human evaluators to rate the fluency of the generated prompts and the
-> perceived toxicity of the responses collected during the testing sessions.
-> Results indicate that the effectiveness, in terms of detected toxicity level,
-> is significantly higher than the selected baseline methods (effect size up to
-> 1.0 against random search and up to 0.99 against adversarial attacks).
-> Furthermore, EvoTox yields a limited cost overhead (from 22% to 35% on
-> average).
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "While the paper involves Large Language Models (LLMs) and manipulates textual input prompts to test toxicity, its primary focus is on developing a testing framework for detecting toxicity rather than optimizing prompt engineering \*for improved LLM performance\* through textual input manipulation."
-}
-
----
-
-## [Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic
-  Data Generation and Fairness Algorithms](https://arxiv.org/abs/http://arxiv.org/abs/2501.01785v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01785v1
-
-**Abstract:**
-> The increasing use of machine learning in learning analytics (LA) has raised
-> significant concerns around algorithmic fairness and privacy. Synthetic data
-> has emerged as a dual-purpose tool, enhancing privacy and improving fairness in
-> LA models. However, prior research suggests an inverse relationship between
-> fairness and privacy, making it challenging to optimize both. This study
-> investigates which synthetic data generators can best balance privacy and
-> fairness, and whether pre-processing fairness algorithms, typically applied to
-> real datasets, are effective on synthetic data. Our results highlight that the
-> DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between
-> privacy and fairness. However, DECAF suffers in utility, as reflected in its
-> predictive accuracy. Notably, we found that applying pre-processing fairness
-> algorithms to synthetic data improves fairness even more than when applied to
-> real data. These findings suggest that combining synthetic data generation with
-> fairness pre-processing offers a promising approach to creating fairer LA
-> models.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the primary focus criteria, as it focuses on synthetic data generation for balancing privacy and fairness in learning analytics models, rather than engineering, design, or optimization of prompts specifically for Large Language Models (LLMs)."
-}
-
----
-
-## [BERT4MIMO: A Foundation Model using BERT Architecture for Massive MIMO
-  Channel State Information Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.01802v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01802v1
-
-**Abstract:**
-> Massive MIMO (Multiple-Input Multiple-Output) is an advanced wireless
-> communication technology, using a large number of antennas to improve the
-> overall performance of the communication system in terms of capacity, spectral,
-> and energy efficiency. The performance of MIMO systems is highly dependent on
-> the quality of channel state information (CSI). Predicting CSI is, therefore,
-> essential for improving communication system performance, particularly in MIMO
-> systems, since it represents key characteristics of a wireless channel,
-> including propagation, fading, scattering, and path loss. This study proposes a
-> foundation model inspired by BERT, called BERT4MIMO, which is specifically
-> designed to process high-dimensional CSI data from massive MIMO systems.
-> BERT4MIMO offers superior performance in reconstructing CSI under varying
-> mobility scenarios and channel conditions through deep learning and attention
-> mechanisms. The experimental results demonstrate the effectiveness of BERT4MIMO
-> in a variety of wireless environments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a new foundation model (BERT4MIMO) for predicting channel state information in Massive MIMO systems, which does not meet the primary criteria of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large
-  Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.01830v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01830v1
-
-**Abstract:**
-> Automated red-teaming has become a crucial approach for uncovering
-> vulnerabilities in large language models (LLMs). However, most existing methods
-> focus on isolated safety flaws, limiting their ability to adapt to dynamic
-> defenses and uncover complex vulnerabilities efficiently. To address this
-> challenge, we propose Auto-RT, a reinforcement learning framework that
-> automatically explores and optimizes complex attack strategies to effectively
-> uncover security vulnerabilities through malicious queries. Specifically, we
-> introduce two key mechanisms to reduce exploration complexity and improve
-> strategy optimization: 1) Early-terminated Exploration, which accelerate
-> exploration by focusing on high-potential attack strategies; and 2) Progressive
-> Reward Tracking algorithm with intermediate downgrade models, which dynamically
-> refine the search trajectory toward successful vulnerability exploitation.
-> Extensive experiments across diverse LLMs demonstrate that, by significantly
-> improving exploration efficiency and automatically optimizing attack
-> strategies, Auto-RT detects a boarder range of vulnerabilities, achieving a
-> faster detection speed and 16.63\% higher success rates compared to existing
-> methods.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on developing a reinforcement learning framework for automated red-teaming to uncover security vulnerabilities in LLMs, rather than focusing on the engineering, design, or optimization of prompts specifically for improving LLM performance through textual input manipulation."
-}
-
----
-
-## [LCFed: An Efficient Clustered Federated Learning Framework for
-  Heterogeneous Data](https://arxiv.org/abs/http://arxiv.org/abs/2501.01850v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01850v1
-
-**Abstract:**
-> Clustered federated learning (CFL) addresses the performance challenges posed
-> by data heterogeneity in federated learning (FL) by organizing edge devices
-> with similar data distributions into clusters, enabling collaborative model
-> training tailored to each group. However, existing CFL approaches strictly
-> limit knowledge sharing to within clusters, lacking the integration of global
-> knowledge with intra-cluster training, which leads to suboptimal performance.
-> Moreover, traditional clustering methods incur significant computational
-> overhead, especially as the number of edge devices increases. In this paper, we
-> propose LCFed, an efficient CFL framework to combat these challenges. By
-> leveraging model partitioning and adopting distinct aggregation strategies for
-> each sub-model, LCFed effectively incorporates global knowledge into
-> intra-cluster co-training, achieving optimal training performance.
-> Additionally, LCFed customizes a computationally efficient model similarity
-> measurement method based on low-rank models, enabling real-time cluster updates
-> with minimal computational overhead. Extensive experiments show that LCFed
-> outperforms state-of-the-art benchmarks in both test accuracy and clustering
-> computational efficiency.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a new federated learning framework (LCFed) for heterogeneous data, with no mention of prompt engineering, Large Language Models (LLMs), or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Evaluating Scenario-based Decision-making for Interactive Autonomous
-  Driving Using Rational Criteria: A Survey](https://arxiv.org/abs/http://arxiv.org/abs/2501.01886v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01886v1
-
-**Abstract:**
-> Autonomous vehicles (AVs) can significantly promote the advances in road
-> transport mobility in terms of safety, reliability, and decarbonization.
-> However, ensuring safety and efficiency in interactive during within dynamic
-> and diverse environments is still a primary barrier to large-scale AV adoption.
-> In recent years, deep reinforcement learning (DRL) has emerged as an advanced
-> AI-based approach, enabling AVs to learn decision-making strategies adaptively
-> from data and interactions. DRL strategies are better suited than traditional
-> rule-based methods for handling complex, dynamic, and unpredictable driving
-> environments due to their adaptivity. However, varying driving scenarios
-> present distinct challenges, such as avoiding obstacles on highways and
-> reaching specific exits at intersections, requiring different scenario-specific
-> decision-making algorithms. Many DRL algorithms have been proposed in
-> interactive decision-making. However, a rationale review of these DRL
-> algorithms across various scenarios is lacking. Therefore, a comprehensive
-> evaluation is essential to assess these algorithms from multiple perspectives,
-> including those of vehicle users and vehicle manufacturers. This survey reviews
-> the application of DRL algorithms in autonomous driving across typical
-> scenarios, summarizing road features and recent advancements. The scenarios
-> include highways, on-ramp merging, roundabouts, and unsignalized intersections.
-> Furthermore, DRL-based algorithms are evaluated based on five rationale
-> criteria: driving safety, driving efficiency, training efficiency,
-> unselfishness, and interpretability (DDTUI). Each criterion of DDTUI is
-> specifically analyzed in relation to the reviewed algorithms. Finally, the
-> challenges for future DRL-based decision-making algorithms are summarized.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on autonomous driving, deep reinforcement learning algorithms, and decision-making for autonomous vehicles, rather than on the engineering, design, or optimization of prompts for Large Language Models (LLMs), failing to meet the core subject requirement.
-
----
-
-## [QuArch: A Question-Answering Dataset for AI Agents in Computer
-  Architecture](https://arxiv.org/abs/http://arxiv.org/abs/2501.01892v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01892v2
-
-**Abstract:**
-> We introduce QuArch, a dataset of 1500 human-validated question-answer pairs
-> designed to evaluate and enhance language models' understanding of computer
-> architecture. The dataset covers areas including processor design, memory
-> systems, and performance optimization. Our analysis highlights a significant
-> performance gap: the best closed-source model achieves 84% accuracy, while the
-> top small open-source model reaches 72%. We observe notable struggles in memory
-> systems, interconnection networks, and benchmarking. Fine-tuning with QuArch
-> improves small model accuracy by up to 8%, establishing a foundation for
-> advancing AI-driven computer architecture research. The dataset and leaderboard
-> are at https://harvard-edge.github.io/QuArch/.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on introducing a dataset (QuArch) for evaluating and enhancing language models' understanding of computer architecture, rather than specifically investigating, analyzing, or proposing methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
-## [Mingling with the Good to Backdoor Federated Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01913v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01913v1
-
-**Abstract:**
-> Federated learning (FL) is a decentralized machine learning technique that
-> allows multiple entities to jointly train a model while preserving dataset
-> privacy. However, its distributed nature has raised various security concerns,
-> which have been addressed by increasingly sophisticated defenses. These
-> protections utilize a range of data sources and metrics to, for example, filter
-> out malicious model updates, ensuring that the impact of attacks is minimized
-> or eliminated.
-> This paper explores the feasibility of designing a generic attack method
-> capable of installing backdoors in FL while evading a diverse array of
-> defenses. Specifically, we focus on an attacker strategy called MIGO, which
-> aims to produce model updates that subtly blend with legitimate ones. The
-> resulting effect is a gradual integration of a backdoor into the global model,
-> often ensuring its persistence long after the attack concludes, while
-> generating enough ambiguity to hinder the effectiveness of defenses.
-> MIGO was employed to implant three types of backdoors across five datasets
-> and different model architectures. The results demonstrate the significant
-> threat posed by these backdoors, as MIGO consistently achieved exceptionally
-> high backdoor accuracy (exceeding 90%) while maintaining the utility of the
-> main task. Moreover, MIGO exhibited strong evasion capabilities against ten
-> defenses, including several state-of-the-art methods. When compared to four
-> other attack strategies, MIGO consistently outperformed them across most
-> configurations. Notably, even in extreme scenarios where the attacker controls
-> just 0.1% of the clients, the results indicate that successful backdoor
-> insertion is possible if the attacker can persist for a sufficient number of
-> rounds.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on backdoor attacks in Federated Learning, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [On the Utility of Equivariance and Symmetry Breaking in Deep Learning
-  Architectures on Point Clouds](https://arxiv.org/abs/http://arxiv.org/abs/2501.01999v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01999v1
-
-**Abstract:**
-> This paper explores the key factors that influence the performance of models
-> working with point clouds, across different tasks of varying geometric
-> complexity. In this work, we explore the trade-offs between flexibility and
-> weight-sharing introduced by equivariant layers, assessing when equivariance
-> boosts or detracts from performance. It is often argued that providing more
-> information as input improves a model's performance. However, if this
-> additional information breaks certain properties, such as $\SE(3)$
-> equivariance, does it remain beneficial? We identify the key aspects of
-> equivariant and non-equivariant architectures that drive success in different
-> tasks by benchmarking them on segmentation, regression, and generation tasks
-> across multiple datasets with increasing complexity. We observe a positive
-> impact of equivariance, which becomes more pronounced with increasing task
-> complexity, even when strict equivariance is not required.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development and performance of deep learning architectures for point clouds, discussing equivariance and symmetry breaking, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions.
-
----
-
-## [Multi-Center Study on Deep Learning-Assisted Detection and
-  Classification of Fetal Central Nervous System Anomalies Using Ultrasound
-  Imaging](https://arxiv.org/abs/http://arxiv.org/abs/2501.02000v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02000v1
-
-**Abstract:**
-> Prenatal ultrasound evaluates fetal growth and detects congenital
-> abnormalities during pregnancy, but the examination of ultrasound images by
-> radiologists requires expertise and sophisticated equipment, which would
-> otherwise fail to improve the rate of identifying specific types of fetal
-> central nervous system (CNS) abnormalities and result in unnecessary patient
-> examinations. We construct a deep learning model to improve the overall
-> accuracy of the diagnosis of fetal cranial anomalies to aid prenatal diagnosis.
-> In our collected multi-center dataset of fetal craniocerebral anomalies
-> covering four typical anomalies of the fetal central nervous system (CNS):
-> anencephaly, encephalocele (including meningocele), holoprosencephaly, and
-> rachischisis, patient-level prediction accuracy reaches 94.5%, with an AUROC
-> value of 99.3%. In the subgroup analyzes, our model is applicable to the entire
-> gestational period, with good identification of fetal anomaly types for any
-> gestational period. Heatmaps superimposed on the ultrasound images not only
-> provide a visual interpretation for the algorithm but also provide an intuitive
-> visual aid to the physician by highlighting key areas that need to be reviewed,
-> helping the physician to quickly identify and validate key areas. Finally, the
-> retrospective reader study demonstrates that by combining the automatic
-> prediction of the DL system with the professional judgment of the radiologist,
-> the diagnostic accuracy and efficiency can be effectively improved and the
-> misdiagnosis rate can be reduced, which has an important clinical application
-> prospect.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on deep learning-assisted detection and classification of fetal central nervous system anomalies using ultrasound imaging, which falls under medical subjects and does not involve prompt engineering for Large Language Models (LLMs) as required.
-
----
-
-## [General Information Metrics for Improving AI Model Training Efficiency](https://arxiv.org/abs/http://arxiv.org/abs/2501.02004v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02004v1
-
-**Abstract:**
-> To address the growing size of AI model training data and the lack of a
-> universal data selection methodology-factors that significantly drive up
-> training costs -- this paper presents the General Information Metrics
-> Evaluation (GIME) method. GIME leverages general information metrics from
-> Objective Information Theory (OIT), including volume, delay, scope,
-> granularity, variety, duration, sampling rate, aggregation, coverage,
-> distortion, and mismatch to optimize dataset selection for training purposes.
-> Comprehensive experiments conducted across diverse domains, such as CTR
-> Prediction, Civil Case Prediction, and Weather Forecasting, demonstrate that
-> GIME effectively preserves model performance while substantially reducing both
-> training time and costs. Additionally, applying GIME within the Judicial AI
-> Program led to a remarkable 39.56% reduction in total model training expenses,
-> underscoring its potential to support efficient and sustainable AI development.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on optimizing dataset selection for AI model training efficiency using General Information Metrics, not on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), thus failing to meet the primary 'MUST' criteria."
-}
-
----
-
-## [ST-HCSS: Deep Spatio-Temporal Hypergraph Convolutional Neural Network
-  for Soft Sensing](https://arxiv.org/abs/http://arxiv.org/abs/2501.02016v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02016v1
-
-**Abstract:**
-> Higher-order sensor networks are more accurate in characterizing the
-> nonlinear dynamics of sensory time-series data in modern industrial settings by
-> allowing multi-node connections beyond simple pairwise graph edges. In light of
-> this, we propose a deep spatio-temporal hypergraph convolutional neural network
-> for soft sensing (ST-HCSS). In particular, our proposed framework is able to
-> construct and leverage a higher-order graph (hypergraph) to model the complex
-> multi-interactions between sensor nodes in the absence of prior structural
-> knowledge. To capture rich spatio-temporal relationships underlying sensor
-> data, our proposed ST-HCSS incorporates stacked gated temporal and hypergraph
-> convolution layers to effectively aggregate and update hypergraph information
-> across time and nodes. Our results validate the superiority of ST-HCSS compared
-> to existing state-of-the-art soft sensors, and demonstrates that the learned
-> hypergraph feature representations aligns well with the sensor data
-> correlations. The code is available at https://github.com/htew0001/ST-HCSS.git
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on developing a new neural network architecture (ST-HCSS) for soft sensing in industrial settings, with no apparent primary focus on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompts."
-}
-
----
-
-## [Safeguarding Large Language Models in Real-time with Tunable
-  Safety-Performance Trade-offs](https://arxiv.org/abs/http://arxiv.org/abs/2501.02018v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02018v1
-
-**Abstract:**
-> Large Language Models (LLMs) have been shown to be susceptible to jailbreak
-> attacks, or adversarial attacks used to illicit high risk behavior from a
-> model. Jailbreaks have been exploited by cybercriminals and blackhat actors to
-> cause significant harm, highlighting the critical need to safeguard
-> widely-deployed models. Safeguarding approaches, which include fine-tuning
-> models or having LLMs "self-reflect", may lengthen the inference time of a
-> model, incur a computational penalty, reduce the semantic fluency of an output,
-> and restrict ``normal'' model behavior. Importantly, these Safety-Performance
-> Trade-offs (SPTs) remain an understudied area. In this work, we introduce a
-> novel safeguard, called SafeNudge, that combines Controlled Text Generation
-> with "nudging", or using text interventions to change the behavior of a model.
-> SafeNudge triggers during text-generation while a jailbreak attack is being
-> executed, and can reduce successful jailbreak attempts by 30% by guiding the
-> LLM towards a safe responses. It adds minimal latency to inference and has a
-> negligible impact on the semantic fluency of outputs. Further, we allow for
-> tunable SPTs. SafeNudge is open-source and available through https://pypi.org/,
-> and is compatible with models loaded with the Hugging Face "transformers"
-> library.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on safeguarding LLMs from jailbreak attacks through a novel method (SafeNudge), which involves Controlled Text Generation and 'nudging'. While it does manipulate textual input, the core subject is enhancing LLM safety and security, not prompt engineering for text-based interactions with LLMs, thus not meeting the 'MUST' criteria.
-
----
-
-## [Benchmarking Constraint-Based Bayesian Structure Learning Algorithms:
-  Role of Network Topology](https://arxiv.org/abs/http://arxiv.org/abs/2501.02019v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02019v1
-
-**Abstract:**
-> Modeling the associations between real world entities from their multivariate
-> cross-sectional profiles can provide cues into the concerted working of these
-> entities as a system. Several techniques have been proposed for deciphering
-> these associations including constraint-based Bayesian structure learning (BSL)
-> algorithms that model them as directed acyclic graphs. Benchmarking these
-> algorithms have typically focused on assessing the variation in performance
-> measures such as sensitivity as a function of the dimensionality represented by
-> the number of nodes in the DAG, and sample size. The present study elucidates
-> the importance of network topology in benchmarking exercises. More
-> specifically, it investigates variations in sensitivity across distinct network
-> topologies while constraining the nodes, edges, and sample-size to be
-> identical, eliminating these as potential confounders. Sensitivity of three
-> popular constraint-based BSL algorithms (Peter-Clarke, Grow-Shrink, Incremental
-> Association Markov Blanket) in learning the network structure from multivariate
-> cross-sectional profiles sampled from network models with sub-linear, linear,
-> and super-linear DAG topologies generated using preferential attachment is
-> investigated. Results across linear and nonlinear models revealed statistically
-> significant $(\alpha=0.05)$ decrease in sensitivity estimates from sub-linear
-> to super-linear topology constitutively across the three algorithms. These
-> results are demonstrated on networks with nodes $(N_{nods}=48,64)$, noise
-> strengths $(\sigma =3,6)$ and sample size $(N = 2^{10})$. The findings
-> elucidate the importance of accommodating the network topology in
-> constraint-based BSL benchmarking exercises.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the 'MUST' criteria as it focuses on benchmarking constraint-based Bayesian structure learning algorithms, which is unrelated to engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not investigate the manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [Model Checking in Medical Imaging for Tumor Detection and Segmentation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02024v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02024v2
-
-**Abstract:**
-> Recent advancements in model checking have demonstrated significant potential
-> across diverse applications, particularly in signal and image analysis. Medical
-> imaging stands out as a critical domain where model checking can be effectively
-> applied to design and evaluate robust frameworks. These frameworks facilitate
-> automatic and semi-automatic delineation of regions of interest within images,
-> aiding in accurate segmentation. This paper provides a comprehensive analysis
-> of recent works leveraging spatial logic to develop operators and tools for
-> identifying regions of interest, including tumorous and non-tumorous areas.
-> Additionally, we examine the challenges inherent to spatial model-checking
-> techniques, such as variability in ground truth data and the need for
-> streamlined procedures suitable for routine clinical practice.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on medical imaging for tumor detection and segmentation, and model checking techniques, rather than prompt engineering for Large Language Models (LLMs), thus violating the 'MUST NOT' criteria regarding primary concern with medical subjects and not meeting the core subject requirement of prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large
-  Vision-Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02029v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02029v1
-
-**Abstract:**
-> With the integration of an additional modality, large vision-language models
-> (LVLMs) exhibit greater vulnerability to safety risks (e.g., jailbreaking)
-> compared to their language-only predecessors. Although recent studies have
-> devoted considerable effort to the post-hoc alignment of LVLMs, the inner
-> safety mechanisms remain largely unexplored. In this paper, we discover that
-> internal activations of LVLMs during the first token generation can effectively
-> identify malicious prompts across different attacks. This inherent safety
-> perception is governed by sparse attention heads, which we term ``safety
-> heads." Further analysis reveals that these heads act as specialized shields
-> against malicious prompts; ablating them leads to higher attack success rates,
-> while the model's utility remains unaffected. By locating these safety heads
-> and concatenating their activations, we construct a straightforward but
-> powerful malicious prompt detector that integrates seamlessly into the
-> generation process with minimal extra inference overhead. Despite its simple
-> structure of a logistic regression model, the detector surprisingly exhibits
-> strong zero-shot generalization capabilities. Experiments across various
-> prompt-based attacks confirm the effectiveness of leveraging safety heads to
-> protect LVLMs. Code is available at \url{https://github.com/Ziwei-Zheng/SAHs}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on identifying safety risks in Large Vision-Language Models (LVLMs) through internal attention heads, rather than specifically engineering or optimizing textual input prompts for Large Language Models (LLMs), thus not meeting the primary focus criteria.
-
----
-
-## [Detecting Music Performance Errors with Transformers](https://arxiv.org/abs/http://arxiv.org/abs/2501.02030v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02030v1
-
-**Abstract:**
-> Beginner musicians often struggle to identify specific errors in their
-> performances, such as playing incorrect notes or rhythms. There are two
-> limitations in existing tools for music error detection: (1) Existing
-> approaches rely on automatic alignment; therefore, they are prone to errors
-> caused by small deviations between alignment targets.; (2) There is a lack of
-> sufficient data to train music error detection models, resulting in
-> over-reliance on heuristics. To address (1), we propose a novel transformer
-> model, Polytune, that takes audio inputs and outputs annotated music scores.
-> This model can be trained end-to-end to implicitly align and compare
-> performance audio with music scores through latent space representations. To
-> address (2), we present a novel data generation technique capable of creating
-> large-scale synthetic music error datasets. Our approach achieves a 64.1%
-> average Error Detection F1 score, improving upon prior work by 40 percentage
-> points across 14 instruments. Additionally, compared with existing
-> transcription methods repurposed for music error detection, our model can
-> handle multiple instruments. Our source code and datasets are available at
-> https://github.com/ben2002chou/Polytune.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on music performance error detection using a novel transformer model and data generation technique, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or their textual input prompts.
-
----
-
-## [Dynamic Feature Fusion: Combining Global Graph Structures and Local
-  Semantics for Blockchain Fraud Detection](https://arxiv.org/abs/http://arxiv.org/abs/2501.02032v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02032v1
-
-**Abstract:**
-> The advent of blockchain technology has facilitated the widespread adoption
-> of smart contracts in the financial sector. However, current fraud detection
-> methodologies exhibit limitations in capturing both global structural patterns
-> within transaction networks and local semantic relationships embedded in
-> transaction data. Most existing models focus on either structural information
-> or semantic features individually, leading to suboptimal performance in
-> detecting complex fraud patterns.In this paper, we propose a dynamic feature
-> fusion model that combines graph-based representation learning and semantic
-> feature extraction for blockchain fraud detection. Specifically, we construct
-> global graph representations to model account relationships and extract local
-> contextual features from transaction data. A dynamic multimodal fusion
-> mechanism is introduced to adaptively integrate these features, enabling the
-> model to capture both structural and semantic fraud patterns effectively. We
-> further develop a comprehensive data processing pipeline, including graph
-> construction, temporal feature enhancement, and text preprocessing.
-> Experimental results on large-scale real-world blockchain datasets demonstrate
-> that our method outperforms existing benchmarks across accuracy, F1 score, and
-> recall metrics. This work highlights the importance of integrating structural
-> relationships and semantic similarities for robust fraud detection and offers a
-> scalable solution for securing blockchain systems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it concentrates on a dynamic feature fusion model for blockchain fraud detection, utilizing graph-based representation learning and semantic feature extraction, with no evident connection to LLM prompt engineering.
-
----
-
-## [Deep Clustering via Community Detection](https://arxiv.org/abs/http://arxiv.org/abs/2501.02036v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02036v1
-
-**Abstract:**
-> Deep clustering is an essential task in modern artificial intelligence,
-> aiming to partition a set of data samples into a given number of homogeneous
-> groups (i.e., clusters). Even though many Deep Neural Network (DNN) backbones
-> and clustering strategies have been proposed for the task, achieving
-> increasingly improved performance, deep clustering remains very challenging due
-> to the lack of accurately labeled samples. In this paper, we propose a novel
-> approach of deep clustering via community detection. It initializes clustering
-> by detecting many communities, and then gradually expands clusters by community
-> merging. Compared with the existing clustering strategies, community detection
-> factors in the new perspective of cluster network analysis. As a result, it has
-> the inherent benefit of high pseudo-label purity, which is critical to the
-> performance of self-supervision. We have validated the efficacy of the proposed
-> approach on benchmark image datasets. Our extensive experiments have shown that
-> it can effectively improve the SOTA performance. Our ablation study also
-> demonstrates that the new network perspective can effectively improve community
-> pseudo-label purity, resulting in improved clustering performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on deep clustering via community detection for image datasets using Deep Neural Networks (DNNs), with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring](https://arxiv.org/abs/http://arxiv.org/abs/2501.02045v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02045v1
-
-**Abstract:**
-> We pretrain METAGENE-1, a 7-billion-parameter autoregressive transformer
-> model, which we refer to as a metagenomic foundation model, on a novel corpus
-> of diverse metagenomic DNA and RNA sequences comprising over 1.5 trillion base
-> pairs. This dataset is sourced from a large collection of human wastewater
-> samples, processed and sequenced using deep metagenomic (next-generation)
-> sequencing methods. Unlike genomic models that focus on individual genomes or
-> curated sets of specific species, the aim of METAGENE-1 is to capture the full
-> distribution of genomic information present within this wastewater, to aid in
-> tasks relevant to pandemic monitoring and pathogen detection. We carry out
-> byte-pair encoding (BPE) tokenization on our dataset, tailored for metagenomic
-> sequences, and then pretrain our model. In this paper, we first detail the
-> pretraining dataset, tokenization strategy, and model architecture,
-> highlighting the considerations and design choices that enable the effective
-> modeling of metagenomic data. We then show results of pretraining this model on
-> our metagenomic dataset, providing details about our losses, system metrics,
-> and training stability over the course of pretraining. Finally, we demonstrate
-> the performance of METAGENE-1, which achieves state-of-the-art results on a set
-> of genomic benchmarks and new evaluations focused on human-pathogen detection
-> and genomic sequence embedding, showcasing its potential for public health
-> applications in pandemic monitoring, biosurveillance, and early detection of
-> emerging health threats.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a new foundation model (METAGENE-1) for pandemic monitoring through pretraining on metagenomic sequences, rather than on prompt engineering for Large Language Models (LLMs), and does not investigate or propose methods for improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Relaxation-assisted reverse annealing on nonnegative/binary matrix
-  factorization](https://arxiv.org/abs/http://arxiv.org/abs/2501.02114v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02114v1
-
-**Abstract:**
-> Quantum annealing has garnered significant attention as meta-heuristics
-> inspired by quantum physics for combinatorial optimization problems. Among its
-> many applications, nonnegative/binary matrix factorization stands out for its
-> complexity and relevance in unsupervised machine learning. The use of reverse
-> annealing, a derivative procedure of quantum annealing to prioritize the search
-> in a vicinity under a given initial state, helps improve its optimization
-> performance in matrix factorization. This study proposes an improved strategy
-> that integrates reverse annealing with a linear programming relaxation
-> technique. Using relaxed solutions as the initial configuration for reverse
-> annealing, we demonstrate improvements in optimization performance comparable
-> to the exact optimization methods. Our experiments on facial image datasets
-> show that our method provides better convergence than known reverse annealing
-> methods. Furthermore, we investigate the effectiveness of relaxation-based
-> initialization methods on randomized datasets, demonstrating a relationship
-> between the relaxed solution and the optimal solution. This research
-> underscores the potential of combining reverse annealing and classical
-> optimization strategies to enhance optimization performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on optimizing nonnegative/binary matrix factorization using quantum annealing and linear programming relaxation, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [A hybrid marketplace of ideas](https://arxiv.org/abs/http://arxiv.org/abs/2501.02132v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02132v2
-
-**Abstract:**
-> The convergence of humans and artificial intelligence systems introduces new
-> dynamics into the cultural and intellectual landscape. Complementing emerging
-> cultural evolution concepts such as machine culture, AI agents represent a
-> significant techno-sociological development, particularly within the
-> anthropological study of Web3 as a community focused on decentralization
-> through blockchain. Despite their growing presence, the cultural significance
-> of AI agents remains largely unexplored in academic literature. Toward this
-> end, we conceived hybrid netnography, a novel interdisciplinary approach that
-> examines the cultural and intellectual dynamics within digital ecosystems by
-> analyzing the interactions and contributions of both human and AI agents as
-> co-participants in shaping narratives, ideas, and cultural artifacts. We argue
-> that, within the Web3 community on the social media platform X, these agents
-> challenge traditional notions of participation and influence in public
-> discourse, creating a hybrid marketplace of ideas, a conceptual space where
-> human and AI generated ideas coexist and compete for attention. We examine the
-> current state of AI agents in idea generation, propagation, and engagement,
-> positioning their role as cultural agents through the lens of memetics and
-> encouraging further inquiry into their cultural and societal impact.
-> Additionally, we address the implications of this paradigm for privacy,
-> intellectual property, and governance, highlighting the societal and legal
-> challenges of integrating AI agents into the hybrid marketplace of ideas.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on the societal and cultural impact of AI agents in a Web3 community, not on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), failing to meet the core subject requirement.
-
----
-
-## [Effective LLM-Driven Code Generation with Pythoness](https://arxiv.org/abs/http://arxiv.org/abs/2501.02138v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02138v1
-
-**Abstract:**
-> The advent of large language models (LLMs) has paved the way for a new era of
-> programming tools with both significant capabilities and risks, as the
-> generated code lacks guarantees of correctness and reliability. Developers
-> using LLMs currently face the difficult task of optimizing, integrating, and
-> maintaining code generated by AI. We propose an embedded domain-specific
-> language (DSL), Pythoness, to address those challenges. In Pythoness,
-> developers program with LLMs at a higher level of abstraction. Rather than
-> interacting directly with generated code, developers using Pythoness operate at
-> the level of behavioral specifications when writing functions, classes, or an
-> entire program. These specifications can take the form of unit tests and
-> property-based tests, which may be expressed formally or in natural language.
-> Guided by these specifications, Pythoness generates code that both passes the
-> tests and can be continuously checked during execution. We posit that the
-> Pythoness approach lets developers harness the full potential of LLMs for code
-> generation while substantially mitigating their inherent risks. We describe our
-> current prototype implementation of Pythoness and demonstrate that it can
-> successfully leverage a combination of tests and code generation to yield
-> higher quality code than specifications alone.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing an embedded domain-specific language (DSL) for safer code generation with LLMs, rather than on prompt engineering for text-based interactions with LLMs. While LLMs are utilized, the core subject is the DSL (Pythoness) and its application, not the optimization or design of textual input prompts for improving LLM performance.
-
----
-
-## [Establishing baselines for generative discovery of inorganic crystals](https://arxiv.org/abs/http://arxiv.org/abs/2501.02144v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02144v1
-
-**Abstract:**
-> Generative artificial intelligence offers a promising avenue for materials
-> discovery, yet its advantages over traditional methods remain unclear. In this
-> work, we introduce and benchmark two baseline approaches - random enumeration
-> of charge-balanced prototypes and data-driven ion exchange of known compounds -
-> against three generative models: a variational autoencoder, a large language
-> model, and a diffusion model. Our results show that established methods such as
-> ion exchange perform comparably well in generating stable materials, although
-> many of these materials tend to closely resemble known compounds. In contrast,
-> generative models excel at proposing novel structural frameworks and, when
-> sufficient training data is available, can more effectively target properties
-> such as electronic band gap and bulk modulus while maintaining a high stability
-> rate. To enhance the performance of both the baseline and generative
-> approaches, we implement a post-generation screening step in which all proposed
-> structures are passed through stability and property filters from pre-trained
-> machine learning models including universal interatomic potentials. This
-> low-cost filtering step leads to substantial improvement in the success rates
-> of all methods, remains computationally efficient, and ultimately provides a
-> practical pathway toward more effective generative strategies for materials
-> discovery.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on generative discovery of inorganic crystals using various AI models, including LLMs, but does not centralize prompt engineering for text-based interactions with LLMs. Instead, LLMs are utilized as one of several generative models for materials discovery, not meeting the 'MUST' criteria of focusing primarily on prompt engineering for LLMs."
-}
-
----
-
-## [Plasma-CycleGAN: Plasma Biomarker-Guided MRI to PET Cross-modality
-  Translation Using Conditional CycleGAN](https://arxiv.org/abs/http://arxiv.org/abs/2501.02146v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02146v1
-
-**Abstract:**
-> Cross-modality translation between MRI and PET imaging is challenging due to
-> the distinct mechanisms underlying these modalities. Blood-based biomarkers
-> (BBBMs) are revolutionizing Alzheimer's disease (AD) detection by identifying
-> patients and quantifying brain amyloid levels. However, the potential of BBBMs
-> to enhance PET image synthesis remains unexplored. In this paper, we performed
-> a thorough study on the effect of incorporating BBBM into deep generative
-> models. By evaluating three widely used cross-modality translation models, we
-> found that BBBMs integration consistently enhances the generative quality
-> across all models. By visual inspection of the generated results, we observed
-> that PET images generated by CycleGAN exhibit the best visual fidelity. Based
-> on these findings, we propose Plasma-CycleGAN, a novel generative model based
-> on CycleGAN, to synthesize PET images from MRI using BBBMs as conditions. This
-> is the first approach to integrate BBBMs in conditional cross-modality
-> translation between MRI and PET.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on cross-modality translation between MRI and PET imaging using conditional CycleGAN, with an emphasis on integrating blood-based biomarkers for Alzheimer's disease detection, and does not meet the 'MUST' criteria as it lacks focus on prompt engineering, LLMs, and textual input manipulation.
-
----
-
-## [The Race to Efficiency: A New Perspective on AI Scaling Laws](https://arxiv.org/abs/http://arxiv.org/abs/2501.02156v3)
-**arXiv ID:** http://arxiv.org/abs/2501.02156v3
-
-**Abstract:**
-> As large-scale AI models expand, training becomes costlier and sustaining
-> progress grows harder. Classical scaling laws (e.g., Kaplan et al. (2020),
-> Hoffmann et al. (2022)) predict training loss from a static compute budget yet
-> neglect time and efficiency, prompting the question: how can we balance
-> ballooning GPU fleets with rapidly improving hardware and algorithms? We
-> introduce the relative-loss equation, a time- and efficiency-aware framework
-> that extends classical AI scaling laws. Our model shows that, without ongoing
-> efficiency gains, advanced performance could demand millennia of training or
-> unrealistically large GPU fleets. However, near-exponential progress remains
-> achievable if the "efficiency-doubling rate" parallels Moore's Law. By
-> formalizing this race to efficiency, we offer a quantitative roadmap for
-> balancing front-loaded GPU investments with incremental improvements across the
-> AI stack. Empirical trends suggest that sustained efficiency gains can push AI
-> scaling well into the coming decade, providing a new perspective on the
-> diminishing returns inherent in classical scaling.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on AI scaling laws, efficiency, and balancing GPU investments, with no clear emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria.
-
----
-
-## [Can ChatGPT implement finite element models for geotechnical engineering
-  applications?](https://arxiv.org/abs/http://arxiv.org/abs/2501.02199v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02199v1
-
-**Abstract:**
-> This study assesses the capability of ChatGPT to generate finite element code
-> for geotechnical engineering applications from a set of prompts. We tested
-> three different initial boundary value problems using a hydro-mechanically
-> coupled formulation for unsaturated soils, including the dissipation of excess
-> pore water pressure through fluid mass diffusion in one-dimensional space,
-> time-dependent differential settlement of a strip footing, and gravity-driven
-> seepage. For each case, initial prompting involved providing ChatGPT with
-> necessary information for finite element implementation, such as balance and
-> constitutive equations, problem geometry, initial and boundary conditions,
-> material properties, and spatiotemporal discretization and solution strategies.
-> Any errors and unexpected results were further addressed through prompt
-> augmentation processes until the ChatGPT-generated finite element code passed
-> the verification/validation test. Our results demonstrate that ChatGPT required
-> minimal code revisions when using the FEniCS finite element library, owing to
-> its high-level interfaces that enable efficient programming. In contrast, the
-> MATLAB code generated by ChatGPT necessitated extensive prompt augmentations
-> and/or direct human intervention, as it involves a significant amount of
-> low-level programming required for finite element analysis, such as
-> constructing shape functions or assembling global matrices. Given that prompt
-> engineering for this task requires an understanding of the mathematical
-> formulation and numerical techniques, this study suggests that while a large
-> language model may not yet replace human programmers, it can greatly assist in
-> the implementation of numerical models.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "While the paper does investigate the manipulation of textual input prompts for LLMs (ChatGPT), its primary focus is on the application of LLMs in geotechnical engineering (finite element models) rather than the engineering, design, or optimization of prompts specifically for Large Language Models."
-}
-
----
-
-## [Learning Evolution via Optimization Knowledge Adaptation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02200v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02200v1
-
-**Abstract:**
-> Evolutionary algorithms (EAs) maintain populations through evolutionary
-> operators to discover diverse solutions for complex tasks while gathering
-> valuable knowledge, such as historical population data and fitness evaluations.
-> However, traditional EAs face challenges in dynamically adapting to expanding
-> knowledge bases, hindering the efficient exploitation of accumulated
-> information and limiting adaptability to new situations. To address these
-> issues, we introduce an Optimization Knowledge Adaptation Evolutionary Model
-> (OKAEM), which features dynamic parameter adjustment using accumulated
-> knowledge to enhance its optimization capabilities. OKAEM employs attention
-> mechanisms to model the interactions among individuals, fitness landscapes, and
-> genetic components separately, thereby parameterizing the evolutionary
-> operators of selection, crossover, and mutation. These powerful learnable
-> operators enable OKAEM to benefit from pre-learned extensive prior knowledge
-> and self-tune with real-time evolutionary insights. Experimental results
-> demonstrate that OKAEM: 1) exploits prior knowledge for significant performance
-> gains across various knowledge transfer settings; 2) achieves competitive
-> performance through self-tuning alone, even without prior knowledge; 3)
-> outperforms state-of-the-art black-box baselines in a vision-language model
-> tuning case; 4) can improve its optimization capabilities with growing
-> knowledge; 5) is capable of emulating principles of natural selection and
-> genetic recombination.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on evolutionary algorithms and optimization knowledge adaptation for improving performance in complex tasks, with a specific application in vision-language model tuning. It does not meet the 'MUST' criteria, particularly focusing primarily on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or investigating methods to improve LLM performance through prompt manipulation.
-
----
-
-## [Diffusion Model-Based Data Synthesis Aided Federated Semi-Supervised
-  Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02219v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02219v1
-
-**Abstract:**
-> Federated semi-supervised learning (FSSL) is primarily challenged by two
-> factors: the scarcity of labeled data across clients and the non-independent
-> and identically distribution (non-IID) nature of data among clients. In this
-> paper, we propose a novel approach, diffusion model-based data synthesis aided
-> FSSL (DDSA-FSSL), which utilizes a diffusion model (DM) to generate synthetic
-> data, bridging the gap between heterogeneous local data distributions and the
-> global data distribution. In DDSA-FSSL, clients address the challenge of the
-> scarcity of labeled data by employing a federated learning-trained classifier
-> to perform pseudo labeling for unlabeled data. The DM is then collaboratively
-> trained using both labeled and precision-optimized pseudo-labeled data,
-> enabling clients to generate synthetic samples for classes that are absent in
-> their labeled datasets. This process allows clients to generate more
-> comprehensive synthetic datasets aligned with the global distribution.
-> Extensive experiments conducted on multiple datasets and varying non-IID
-> distributions demonstrate the effectiveness of DDSA-FSSL, e.g., it improves
-> accuracy from 38.46% to 52.14% on CIFAR-10 datasets with 10% labeled data.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on federated semi-supervised learning and utilizes a diffusion model for data synthesis, with no primary emphasis on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [CORD: Generalizable Cooperation via Role Diversity](https://arxiv.org/abs/http://arxiv.org/abs/2501.02221v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02221v2
-
-**Abstract:**
-> Cooperative multi-agent reinforcement learning (MARL) aims to develop agents
-> that can collaborate effectively. However, most cooperative MARL methods
-> overfit training agents, making learned policies not generalize well to unseen
-> collaborators, which is a critical issue for real-world deployment. Some
-> methods attempt to address the generalization problem but require prior
-> knowledge or predefined policies of new teammates, limiting real-world
-> applications. To this end, we propose a hierarchical MARL approach to enable
-> generalizable cooperation via role diversity, namely CORD. CORD's high-level
-> controller assigns roles to low-level agents by maximizing the role entropy
-> with constraints. We show this constrained objective can be decomposed into
-> causal influence in role that enables reasonable role assignment, and role
-> heterogeneity that yields coherent, non-redundant role clusters. Evaluated on a
-> variety of cooperative multi-agent tasks, CORD achieves better performance than
-> baselines, especially in generalization tests. Ablation studies further
-> demonstrate the efficacy of the constrained objective in generalizable
-> cooperation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on cooperative multi-agent reinforcement learning (MARL) and a hierarchical approach for generalizable cooperation, without any mention of Large Language Models (LLMs), prompt engineering, or textual input manipulation, thus failing to meet all the 'MUST' criteria.
-
----
-
-## [Towards a constructive framework for control theory](https://arxiv.org/abs/http://arxiv.org/abs/2501.02267v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02267v1
-
-**Abstract:**
-> This work presents a framework for control theory based on constructive
-> analysis to account for discrepancy between mathematical results and their
-> implementation in a computer, also referred to as computational uncertainty. In
-> control engineering, the latter is usually either neglected or considered
-> submerged into some other type of uncertainty, such as system noise, and
-> addressed within robust control. However, even robust control methods may be
-> compromised when the mathematical objects involved in the respective algorithms
-> fail to exist in exact form and subsequently fail to satisfy the required
-> properties. For instance, in general stabilization using a control Lyapunov
-> function, computational uncertainty may distort stability certificates or even
-> destabilize the system despite robustness of the stabilization routine with
-> regards to system, actuator and measurement noise. In fact, battling numerical
-> problems in practical implementation of controllers is common among control
-> engineers. Such observations indicate that computational uncertainty should
-> indeed be addressed explicitly in controller synthesis and system analysis. The
-> major contribution here is a fairly general framework for proof techniques in
-> analysis and synthesis of control systems based on constructive analysis which
-> explicitly states that every computation be doable only up to a finite
-> precision thus accounting for computational uncertainty. A series of previous
-> works is overviewed, including constructive system stability and stabilization,
-> approximate optimal controls, eigenvalue problems, Caratheodory trajectories,
-> measurable selectors. Additionally, a new constructive version of the Danskin's
-> theorem, which is crucial in adversarial defense, is presented.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary criteria as it focuses on control theory and computational uncertainty in control system engineering, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Deep Learning-Driven Segmentation of Ischemic Stroke Lesions Using
-  Multi-Channel MRI](https://arxiv.org/abs/http://arxiv.org/abs/2501.02287v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02287v1
-
-**Abstract:**
-> Ischemic stroke, caused by cerebral vessel occlusion, presents substantial
-> challenges in medical imaging due to the variability and subtlety of stroke
-> lesions. Magnetic Resonance Imaging (MRI) plays a crucial role in diagnosing
-> and managing ischemic stroke, yet existing segmentation techniques often fail
-> to accurately delineate lesions. This study introduces a novel deep
-> learning-based method for segmenting ischemic stroke lesions using
-> multi-channel MRI modalities, including Diffusion Weighted Imaging (DWI),
-> Apparent Diffusion Coefficient (ADC), and enhanced Diffusion Weighted Imaging
-> (eDWI). The proposed architecture integrates DenseNet121 as the encoder with
-> Self-Organized Operational Neural Networks (SelfONN) in the decoder, enhanced
-> by Channel and Space Compound Attention (CSCA) and Double
-> Squeeze-and-Excitation (DSE) blocks. Additionally, a custom loss function
-> combining Dice Loss and Jaccard Loss with weighted averages is introduced to
-> improve model performance. Trained and evaluated on the ISLES 2022 dataset, the
-> model achieved Dice Similarity Coefficients (DSC) of 83.88% using DWI alone,
-> 85.86% with DWI and ADC, and 87.49% with the integration of DWI, ADC, and eDWI.
-> This approach not only outperforms existing methods but also addresses key
-> limitations in current segmentation practices. These advancements significantly
-> enhance diagnostic precision and treatment planning for ischemic stroke,
-> providing valuable support for clinical decision-making.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a deep learning-based method for segmenting ischemic stroke lesions using MRI modalities, which is a medical application unrelated to Large Language Models (LLMs) or prompt engineering, thus failing to meet all 'MUST' criteria.
-
----
-
-## [DiffGraph: Heterogeneous Graph Diffusion Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.02313v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02313v1
-
-**Abstract:**
-> Recent advances in Graph Neural Networks (GNNs) have revolutionized
-> graph-structured data modeling, yet traditional GNNs struggle with complex
-> heterogeneous structures prevalent in real-world scenarios. Despite progress in
-> handling heterogeneous interactions, two fundamental challenges persist: noisy
-> data significantly compromising embedding quality and learning performance, and
-> existing methods' inability to capture intricate semantic transitions among
-> heterogeneous relations, which impacts downstream predictions. To address these
-> fundamental issues, we present the Heterogeneous Graph Diffusion Model
-> (DiffGraph), a pioneering framework that introduces an innovative cross-view
-> denoising strategy. This advanced approach transforms auxiliary heterogeneous
-> data into target semantic spaces, enabling precise distillation of
-> task-relevant information. At its core, DiffGraph features a sophisticated
-> latent heterogeneous graph diffusion mechanism, implementing a novel forward
-> and backward diffusion process for superior noise management. This methodology
-> achieves simultaneous heterogeneous graph denoising and cross-type transition,
-> while significantly simplifying graph generation through its latent-space
-> diffusion capabilities. Through rigorous experimental validation on both public
-> and industrial datasets, we demonstrate that DiffGraph consistently surpasses
-> existing methods in link prediction and node classification tasks, establishing
-> new benchmarks for robustness and efficiency in heterogeneous graph processing.
-> The model implementation is publicly available at:
-> https://github.com/HKUDS/DiffGraph.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a new graph neural network model (DiffGraph) for handling heterogeneous graph-structured data, primarily addressing issues of noise and semantic transitions in graph processing, with no clear emphasis on prompt engineering for Large Language Models (LLMs) or manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Validity Arguments For Constructed Response Scoring Using Generative
-  Artificial Intelligence Applications](https://arxiv.org/abs/http://arxiv.org/abs/2501.02334v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02334v1
-
-**Abstract:**
-> The rapid advancements in large language models and generative artificial
-> intelligence (AI) capabilities are making their broad application in the
-> high-stakes testing context more likely. Use of generative AI in the scoring of
-> constructed responses is particularly appealing because it reduces the effort
-> required for handcrafting features in traditional AI scoring and might even
-> outperform those methods. The purpose of this paper is to highlight the
-> differences in the feature-based and generative AI applications in constructed
-> response scoring systems and propose a set of best practices for the collection
-> of validity evidence to support the use and interpretation of constructed
-> response scores from scoring systems using generative AI. We compare the
-> validity evidence needed in scoring systems using human ratings, feature-based
-> natural language processing AI scoring engines, and generative AI. The evidence
-> needed in the generative AI context is more extensive than in the feature-based
-> NLP scoring context because of the lack of transparency and other concerns
-> unique to generative AI such as consistency. Constructed response score data
-> from standardized tests demonstrate the collection of validity evidence for
-> different types of scoring systems and highlights the numerous complexities and
-> considerations when making a validity argument for these scores. In addition,
-> we discuss how the evaluation of AI scores might include a consideration of how
-> a contributory scoring approach combining multiple AI scores (from different
-> sources) will cover more of the construct in the absence of human ratings.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on validity arguments for scoring systems using generative AI in high-stakes testing, rather than specifically on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Prompt engineering is not the central concern, and the paper's main subject is the application and validation of generative AI in a testing context, excluding it based on the 'MUST NOT' criteria.
-
----
-
-## [GNSS/GPS Spoofing and Jamming Identification Using Machine Learning and
-  Deep Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02352v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02352v1
-
-**Abstract:**
-> The increasing reliance on Global Navigation Satellite Systems (GNSS),
-> particularly the Global Positioning System (GPS), underscores the urgent need
-> to safeguard these technologies against malicious threats such as spoofing and
-> jamming. As the backbone for positioning, navigation, and timing (PNT) across
-> various applications including transportation, telecommunications, and
-> emergency services GNSS is vulnerable to deliberate interference that poses
-> significant risks. Spoofing attacks, which involve transmitting counterfeit
-> GNSS signals to mislead receivers into calculating incorrect positions, can
-> result in serious consequences, from navigational errors in civilian aviation
-> to security breaches in military operations. Furthermore, the lack of inherent
-> security measures within GNSS systems makes them attractive targets for
-> adversaries. While GNSS/GPS jamming and spoofing systems consist of numerous
-> components, the ability to distinguish authentic signals from malicious ones is
-> essential for maintaining system integrity. Recent advancements in machine
-> learning and deep learning provide promising avenues for enhancing detection
-> and mitigation strategies against these threats. This paper addresses both
-> spoofing and jamming by tackling real-world challenges through machine
-> learning, deep learning, and computer vision techniques. Through extensive
-> experiments on two real-world datasets related to spoofing and jamming
-> detection using advanced algorithms, we achieved state of the art results. In
-> the GNSS/GPS jamming detection task, we attained approximately 99% accuracy,
-> improving performance by around 5% compared to previous studies. Additionally,
-> we addressed a challenging tasks related to spoofing detection, yielding
-> results that underscore the potential of machine learning and deep learning in
-> this domain.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet any of the 'MUST' criteria, as it focuses primarily on using machine learning and deep learning for GNSS/GPS spoofing and jamming identification, with no mention of Large Language Models (LLMs), prompt engineering, or text generation.
-
----
-
-## [FaceSpeak: Expressive and High-Quality Speech Synthesis from Human
-  Portraits of Different Styles](https://arxiv.org/abs/http://arxiv.org/abs/2501.03181v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03181v1
-
-**Abstract:**
-> Humans can perceive speakers' characteristics (e.g., identity, gender,
-> personality and emotion) by their appearance, which are generally aligned to
-> their voice style. Recently, vision-driven Text-to-speech (TTS) scholars
-> grounded their investigations on real-person faces, thereby restricting
-> effective speech synthesis from applying to vast potential usage scenarios with
-> diverse characters and image styles. To solve this issue, we introduce a novel
-> FaceSpeak approach. It extracts salient identity characteristics and emotional
-> representations from a wide variety of image styles. Meanwhile, it mitigates
-> the extraneous information (e.g., background, clothing, and hair color, etc.),
-> resulting in synthesized speech closely aligned with a character's persona.
-> Furthermore, to overcome the scarcity of multi-modal TTS data, we have devised
-> an innovative dataset, namely Expressive Multi-Modal TTS, which is diligently
-> curated and annotated to facilitate research in this domain. The experimental
-> results demonstrate our proposed FaceSpeak can generate portrait-aligned voice
-> with satisfactory naturalness and quality.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on speech synthesis from human portraits using a novel FaceSpeak approach, which involves image processing and Text-to-Speech (TTS) for audio generation. It does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through the manipulation of textual input prompts, nor does it provide concrete examples of prompts and their impact on LLM output.
-
----
-
-## [Classifier-Guided Captioning Across Modalities](https://arxiv.org/abs/http://arxiv.org/abs/2501.03183v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03183v1
-
-**Abstract:**
-> Most current captioning systems use language models trained on data from
-> specific settings, such as image-based captioning via Amazon Mechanical Turk,
-> limiting their ability to generalize to other modality distributions and
-> contexts. This limitation hinders performance in tasks like audio or video
-> captioning, where different semantic cues are needed. Addressing this challenge
-> is crucial for creating more adaptable and versatile captioning frameworks
-> applicable across diverse real-world contexts. In this work, we introduce a
-> method to adapt captioning networks to the semantics of alternative settings,
-> such as capturing audibility in audio captioning, where it is crucial to
-> describe sounds and their sources. Our framework consists of two main
-> components: (i) a frozen captioning system incorporating a language model (LM),
-> and (ii) a text classifier that guides the captioning system. The classifier is
-> trained on a dataset automatically generated by GPT-4, using tailored prompts
-> specifically designed to enhance key aspects of the generated captions.
-> Importantly, the framework operates solely during inference, eliminating the
-> need for further training of the underlying captioning model. We evaluate the
-> framework on various models and modalities, with a focus on audio captioning,
-> and report promising results. Notably, when combined with an existing zero-shot
-> audio captioning system, our framework improves its quality and sets
-> state-of-the-art performance in zero-shot audio captioning.
-
-**Decision Explanation:**
-Original decision: REJECT
-Although the paper mentions using tailored prompts with GPT-4, the primary focus is on adaptating captioning networks across modalities (image, audio, video) and improving the captioning system with a classifier, rather than prompt engineering for Large Language Models (LLMs) being the core subject.
-
----
-
-## [Breaking Through the Spike: Spike Window Decoding for Accelerated and
-  Precise Automatic Speech Recognition](https://arxiv.org/abs/http://arxiv.org/abs/2501.03257v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03257v1
-
-**Abstract:**
-> Recently, end-to-end automatic speech recognition has become the mainstream
-> approach in both industry and academia. To optimize system performance in
-> specific scenarios, the Weighted Finite-State Transducer (WFST) is extensively
-> used to integrate acoustic and language models, leveraging its capacity to
-> implicitly fuse language models within static graphs, thereby ensuring robust
-> recognition while also facilitating rapid error correction. However, WFST
-> necessitates a frame-by-frame search of CTC posterior probabilities through
-> autoregression, which significantly hampers inference speed. In this work, we
-> thoroughly investigate the spike property of CTC outputs and further propose
-> the conjecture that adjacent frames to non-blank spikes carry semantic
-> information beneficial to the model. Building on this, we propose the Spike
-> Window Decoding algorithm, which greatly improves the inference speed by making
-> the number of frames decoded in WFST linearly related to the number of spiking
-> frames in the CTC output, while guaranteeing the recognition performance. Our
-> method achieves SOTA recognition accuracy with significantly accelerates
-> decoding speed, proven across both AISHELL-1 and large-scale In-House datasets,
-> establishing a pioneering approach for integrating CTC output with WFST.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on optimizing Automatic Speech Recognition (ASR) using Spike Window Decoding and Weighted Finite-State Transducers, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Navigation Variable-based Multi-objective Particle Swarm Optimization
-  for UAV Path Planning with Kinematic Constraints](https://arxiv.org/abs/http://arxiv.org/abs/2501.03261v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03261v1
-
-**Abstract:**
-> Path planning is essential for unmanned aerial vehicles (UAVs) as it
-> determines the path that the UAV needs to follow to complete a task. This work
-> addresses this problem by introducing a new algorithm called navigation
-> variable-based multi-objective particle swarm optimization (NMOPSO). It first
-> models path planning as an optimization problem via the definition of a set of
-> objective functions that include optimality and safety requirements for UAV
-> operation. The NMOPSO is then used to minimize those functions through Pareto
-> optimal solutions. The algorithm features a new path representation based on
-> navigation variables to include kinematic constraints and exploit the
-> maneuverable characteristics of the UAV. It also includes an adaptive mutation
-> mechanism to enhance the diversity of the swarm for better solutions.
-> Comparisons with various algorithms have been carried out to benchmark the
-> proposed approach. The results indicate that the NMOPSO performs better than
-> not only other particle swarm optimization variants but also other
-> state-of-the-art multi-objective and metaheuristic optimization algorithms.
-> Experiments have also been conducted with real UAVs to confirm the validity of
-> the approach for practical flights. The source code of the algorithm is
-> available at https://github.com/ngandng/NMOPSO.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a new optimization algorithm for UAV path planning, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Bridge the Inference Gaps of Neural Processes via Expectation
-  Maximization](https://arxiv.org/abs/http://arxiv.org/abs/2501.03264v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03264v1
-
-**Abstract:**
-> The neural process (NP) is a family of computationally efficient models for
-> learning distributions over functions. However, it suffers from under-fitting
-> and shows suboptimal performance in practice. Researchers have primarily
-> focused on incorporating diverse structural inductive biases, \textit{e.g.}
-> attention or convolution, in modeling. The topic of inference suboptimality and
-> an analysis of the NP from the optimization objective perspective has hardly
-> been studied in earlier work. To fix this issue, we propose a surrogate
-> objective of the target log-likelihood of the meta dataset within the
-> expectation maximization framework. The resulting model, referred to as the
-> Self-normalized Importance weighted Neural Process (SI-NP), can learn a more
-> accurate functional prior and has an improvement guarantee concerning the
-> target log-likelihood. Experimental results show the competitive performance of
-> SI-NP over other NPs objectives and illustrate that structural inductive
-> biases, such as attention modules, can also augment our method to achieve SOTA
-> performance. Our code is available at
-> \url{https://github.com/hhq123gogogo/SI_NPs}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on improving Neural Processes (NP) via Expectation Maximization, primarily addressing under-fitting and optimization objectives, with no clear connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Listening and Seeing Again: Generative Error Correction for Audio-Visual
-  Speech Recognition](https://arxiv.org/abs/http://arxiv.org/abs/2501.04038v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04038v1
-
-**Abstract:**
-> Unlike traditional Automatic Speech Recognition (ASR), Audio-Visual Speech
-> Recognition (AVSR) takes audio and visual signals simultaneously to infer the
-> transcription. Recent studies have shown that Large Language Models (LLMs) can
-> be effectively used for Generative Error Correction (GER) in ASR by predicting
-> the best transcription from ASR-generated N-best hypotheses. However, these
-> LLMs lack the ability to simultaneously understand audio and visual, making the
-> GER approach challenging to apply in AVSR. In this work, we propose a novel GER
-> paradigm for AVSR, termed AVGER, that follows the concept of ``listening and
-> seeing again''. Specifically, we first use the powerful AVSR system to read the
-> audio and visual signals to get the N-Best hypotheses, and then use the
-> Q-former-based Multimodal Synchronous Encoder to read the audio and visual
-> information again and convert them into an audio and video compression
-> representation respectively that can be understood by LLM. Afterward, the
-> audio-visual compression representation and the N-Best hypothesis together
-> constitute a Cross-modal Prompt to guide the LLM in producing the best
-> transcription. In addition, we also proposed a Multi-Level Consistency
-> Constraint training criterion, including logits-level, utterance-level and
-> representations-level, to improve the correction accuracy while enhancing the
-> interpretability of audio and visual compression representations. The
-> experimental results on the LRS3 dataset show that our method outperforms
-> current mainstream AVSR systems. The proposed AVGER can reduce the Word Error
-> Rate (WER) by 24% compared to them. Code and models can be found at:
-> https://github.com/CircleRedRain/AVGER.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on improving Audio-Visual Speech Recognition (AVSR) with Generative Error Correction using LLMs, rather than specifically on the engineering, design, or optimization of prompts for LLMs. While prompts are mentioned (Cross-modal Prompt), they are not the central focus, and the work is more about applying LLMs within a larger AVSR system.
-
----
-
-## [A Survey on Large Language Models with some Insights on their
-  Capabilities and Limitations](https://arxiv.org/abs/http://arxiv.org/abs/2501.04040v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04040v1
-
-**Abstract:**
-> The rapid advancement of artificial intelligence, particularly with the
-> development of Large Language Models (LLMs) built on the transformer
-> architecture, has redefined the capabilities of natural language processing.
-> These models now exhibit remarkable performance across various language-related
-> tasks, such as text generation, question answering, translation, and
-> summarization, often rivaling human-like comprehension. More intriguingly, LLMs
-> have demonstrated emergent abilities extending beyond their core functions,
-> showing proficiency in tasks like commonsense reasoning, code generation, and
-> arithmetic. This survey paper explores the foundational components, scaling
-> mechanisms, and architectural strategies that drive these capabilities.
-> Emphasizing models like GPT and LLaMA, we analyze the impact of exponential
-> data and computational growth on LLM performance, while also addressing the
-> trade-offs associated with scaling. We also examine LLM applications across
-> sectors, such as healthcare, finance, education, and law, highlighting their
-> adaptability and potential to solve domain-specific challenges. Central to this
-> work are the questions of how LLMs generalize across diverse tasks, exhibit
-> planning, and reasoning abilities, and whether these emergent abilities can be
-> systematically elicited or enhanced. In particular, we provide some insights
-> into the CoT (Chain of Thought) and PoT (Plan of Thought) abilities within
-> LLMs, focusing on how pre-training data influences their emergence.
-> Additionally, we investigate LLM-modulo frameworks that integrate external
-> systems, allowing LLMs to handle complex, dynamic tasks. By analyzing these
-> factors, this paper aims to foster the ongoing discussion on the capabilities
-> and limits of LLMs, promoting their responsible development and application in
-> novel and increasingly complex environments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on surveying the capabilities and limitations of Large Language Models (LLMs), with only tangential insights into prompt-related aspects (e.g., Chain of Thought abilities), rather than centrally investigating prompt engineering for text-based interactions with LLMs as required.
-
----
-
-## [FLAME: Financial Large-Language Model Assessment and Metrics Evaluation](https://arxiv.org/abs/http://arxiv.org/abs/2501.06211v1)
-**arXiv ID:** http://arxiv.org/abs/2501.06211v1
-
-**Abstract:**
-> LLMs have revolutionized NLP and demonstrated potential across diverse
-> domains. More and more financial LLMs have been introduced for finance-specific
-> tasks, yet comprehensively assessing their value is still challenging. In this
-> paper, we introduce FLAME, a comprehensive financial LLMs evaluation system in
-> Chinese, which includes two core evaluation benchmarks: FLAME-Cer and
-> FLAME-Sce. FLAME-Cer covers 14 types of authoritative financial certifications,
-> including CPA, CFA, and FRM, with a total of approximately 16,000 carefully
-> selected questions. All questions have been manually reviewed to ensure
-> accuracy and representativeness. FLAME-Sce consists of 10 primary core
-> financial business scenarios, 21 secondary financial business scenarios, and a
-> comprehensive evaluation set of nearly 100 tertiary financial application
-> tasks. We evaluate 6 representative LLMs, including GPT-4o, GLM-4, ERNIE-4.0,
-> Qwen2.5, XuanYuan3, and the latest Baichuan4-Finance, revealing
-> Baichuan4-Finance excels other LLMs in most tasks. By establishing a
-> comprehensive and professional evaluation system, FLAME facilitates the
-> advancement of financial LLMs in Chinese contexts. Instructions for
-> participating in the evaluation are available on GitHub:
-> https://github.com/FLAME-ruc/FLAME.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on developing an evaluation system (FLAME) for assessing the performance of financial Large Language Models, rather than primarily investigating, analyzing, or proposing methods for improving LLM performance through the manipulation of textual input prompts."
-}
-
----
-
-## [Operator Learning for Reconstructing Flow Fields from Sparse
-  Measurements: an Energy Transformer Approach](https://arxiv.org/abs/http://arxiv.org/abs/2501.08339v1)
-**arXiv ID:** http://arxiv.org/abs/2501.08339v1
-
-**Abstract:**
-> Machine learning methods have shown great success in various scientific
-> areas, including fluid mechanics. However, reconstruction problems, where full
-> velocity fields must be recovered from partial observations, remain
-> challenging. In this paper, we propose a novel operator learning framework for
-> solving reconstruction problems by using the Energy Transformer (ET), an
-> architecture inspired by associative memory models. We formulate reconstruction
-> as a mapping from incomplete observed data to full reconstructed fields. The
-> method is validated on three fluid mechanics examples using diverse types of
-> data: (1) unsteady 2D vortex street in flow past a cylinder using simulation
-> data; (2) high-speed under-expanded impinging supersonic jets impingement using
-> Schlieren imaging; and (3) 3D turbulent jet flow using particle tracking. The
-> results demonstrate the ability of ET to accurately reconstruct complex flow
-> fields from highly incomplete data (90\% missing), even for noisy experimental
-> measurements, with fast training and inference on a single GPU. This work
-> provides a promising new direction for tackling reconstruction problems in
-> fluid mechanics and other areas in mechanics, geophysics, weather prediction,
-> and beyond.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on operator learning for reconstructing flow fields using a novel Energy Transformer approach, with no primary focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not investigate or propose methods for improving LLM performance through textual input prompt manipulation.
-
----
-
-## [VERITAS: Verifying the Performance of AI-native Transceiver Actions in
-  Base-Stations](https://arxiv.org/abs/http://arxiv.org/abs/2501.09761v1)
-**arXiv ID:** http://arxiv.org/abs/2501.09761v1
-
-**Abstract:**
-> Artificial Intelligence (AI)-native receivers prove significant performance
-> improvement in high noise regimes and can potentially reduce communication
-> overhead compared to the traditional receiver. However, their performance
-> highly depends on the representativeness of the training dataset. A major issue
-> is the uncertainty of whether the training dataset covers all test environments
-> and waveform configurations, and thus, whether the trained model is robust in
-> practical deployment conditions. To this end, we propose a joint
-> measurement-recovery framework for AI-native transceivers post deployment,
-> called VERITAS, that continuously looks for distribution shifts in the received
-> signals and triggers finite re-training spurts. VERITAS monitors the wireless
-> channel using 5G pilots fed to an auxiliary neural network that detects
-> out-of-distribution channel profile, transmitter speed, and delay spread. As
-> soon as such a change is detected, a traditional (reference) receiver is
-> activated, which runs for a period of time in parallel to the AI-native
-> receiver. Finally, VERTIAS compares the bit probabilities of the AI-native and
-> the reference receivers for the same received data inputs, and decides whether
-> or not a retraining process needs to be initiated. Our evaluations reveal that
-> VERITAS can detect changes in the channel profile, transmitter speed, and delay
-> spread with 99%, 97%, and 69% accuracies, respectively, followed by timely
-> initiation of retraining for 86%, 93.3%, and 94.8% of inputs in channel
-> profile, transmitter speed, and delay spread test sets, respectively.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on verifying the performance of AI-native transceiver actions in base-stations, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Dynamics of Adversarial Attacks on Large Language Model-Based Search
-  Engines](https://arxiv.org/abs/http://arxiv.org/abs/2501.00745v1)
-**arXiv ID:** http://arxiv.org/abs/2501.00745v1
-
-**Abstract:**
-> The increasing integration of Large Language Model (LLM) based search engines
-> has transformed the landscape of information retrieval. However, these systems
-> are vulnerable to adversarial attacks, especially ranking manipulation attacks,
-> where attackers craft webpage content to manipulate the LLM's ranking and
-> promote specific content, gaining an unfair advantage over competitors. In this
-> paper, we study the dynamics of ranking manipulation attacks. We frame this
-> problem as an Infinitely Repeated Prisoners' Dilemma, where multiple players
-> strategically decide whether to cooperate or attack. We analyze the conditions
-> under which cooperation can be sustained, identifying key factors such as
-> attack costs, discount rates, attack success rates, and trigger strategies that
-> influence player behavior. We identify tipping points in the system dynamics,
-> demonstrating that cooperation is more likely to be sustained when players are
-> forward-looking. However, from a defense perspective, we find that simply
-> reducing attack success probabilities can, paradoxically, incentivize attacks
-> under certain conditions. Furthermore, defensive measures to cap the upper
-> bound of attack success rates may prove futile in some scenarios. These
-> insights highlight the complexity of securing LLM-based systems. Our work
-> provides a theoretical foundation and practical insights for understanding and
-> mitigating their vulnerabilities, while emphasizing the importance of adaptive
-> security strategies and thoughtful ecosystem design.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the vulnerability and security of LLM-based search engines to adversarial attacks, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), failing to meet the core subject requirement.
-
----
-
-## [MuQ: Self-Supervised Music Representation Learning with Mel Residual
-  Vector Quantization](https://arxiv.org/abs/http://arxiv.org/abs/2501.01108v2)
-**arXiv ID:** http://arxiv.org/abs/2501.01108v2
-
-**Abstract:**
-> Recent years have witnessed the success of foundation models pre-trained with
-> self-supervised learning (SSL) in various music informatics understanding
-> tasks, including music tagging, instrument classification, key detection, and
-> more. In this paper, we propose a self-supervised music representation learning
-> model for music understanding. Distinguished from previous studies adopting
-> random projection or existing neural codec, the proposed model, named MuQ, is
-> trained to predict tokens generated by Mel Residual Vector Quantization
-> (Mel-RVQ). Our Mel-RVQ utilizes residual linear projection structure for Mel
-> spectrum quantization to enhance the stability and efficiency of target
-> extraction and lead to better performance. Experiments in a large variety of
-> downstream tasks demonstrate that MuQ outperforms previous self-supervised
-> music representation models with only 0.9K hours of open-source pre-training
-> data. Scaling up the data to over 160K hours and adopting iterative training
-> consistently improve the model performance. To further validate the strength of
-> our model, we present MuQ-MuLan, a joint music-text embedding model based on
-> contrastive learning, which achieves state-of-the-art performance in the
-> zero-shot music tagging task on the MagnaTagATune dataset. Code and checkpoints
-> are open source in https://github.com/tencent-ailab/MuQ.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on self-supervised music representation learning and a new model (MuQ) for music understanding, with no apparent emphasis on engineering, design, or optimization of prompts for Large Language Models (LLMs) or demonstration of prompt impact on LLM output.
-
----
-
-## [Symmetries-enhanced Multi-Agent Reinforcement Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.01136v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01136v1
-
-**Abstract:**
-> Multi-agent reinforcement learning has emerged as a powerful framework for
-> enabling agents to learn complex, coordinated behaviors but faces persistent
-> challenges regarding its generalization, scalability and sample efficiency.
-> Recent advancements have sought to alleviate those issues by embedding
-> intrinsic symmetries of the systems in the policy. Yet, most dynamical systems
-> exhibit little to no symmetries to exploit. This paper presents a novel
-> framework for embedding extrinsic symmetries in multi-agent system dynamics
-> that enables the use of symmetry-enhanced methods to address systems with
-> insufficient intrinsic symmetries, expanding the scope of equivariant learning
-> to a wide variety of MARL problems. Central to our framework is the Group
-> Equivariant Graphormer, a group-modular architecture specifically designed for
-> distributed swarming tasks. Extensive experiments on a swarm of
-> symmetry-breaking quadrotors validate the effectiveness of our approach,
-> showcasing its potential for improved generalization and zero-shot scalability.
-> Our method achieves significant reductions in collision rates and enhances task
-> success rates across a diverse range of scenarios and varying swarm sizes.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on multi-agent reinforcement learning, introducing a novel framework for embedding extrinsic symmetries, with no apparent focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs) or their direct application in text generation.
-
----
-
-## [Change Detection-Based Procedures for Piecewise Stationary MABs: A
-  Modular Approach](https://arxiv.org/abs/http://arxiv.org/abs/2501.01291v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01291v1
-
-**Abstract:**
-> Conventional Multi-Armed Bandit (MAB) algorithms are designed for stationary
-> environments, where the reward distributions associated with the arms do not
-> change with time. In many applications, however, the environment is more
-> accurately modeled as being nonstationary. In this work, piecewise stationary
-> MAB (PS-MAB) environments are investigated, in which the reward distributions
-> associated with a subset of the arms change at some change-points and remain
-> stationary between change-points. Our focus is on the asymptotic analysis of
-> PS-MABs, for which practical algorithms based on change detection (CD) have
-> been previously proposed. Our goal is to modularize the design and analysis of
-> such CD-based Bandit (CDB) procedures. To this end, we identify the
-> requirements for stationary bandit algorithms and change detectors in a CDB
-> procedure that are needed for the modularization. We assume that the rewards
-> are sub-Gaussian. Under this assumption and a condition on the separation of
-> the change-points, we show that the analysis of CDB procedures can indeed be
-> modularized, so that regret bounds can be obtained in a unified manner for
-> various combinations of change detectors and bandit algorithms. Through this
-> analysis, we develop new modular CDB procedures that are order-optimal. We
-> compare the performance of our modular CDB procedures with various other
-> methods in simulations.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on Multi-Armed Bandit (MAB) algorithms and change detection procedures, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria."
-}
-
----
-
-## [Quantifying A Firm's AI Engagement: Constructing Objective, Data-Driven,
-  AI Stock Indices Using 10-K Filings](https://arxiv.org/abs/http://arxiv.org/abs/2501.01763v1)
-**arXiv ID:** http://arxiv.org/abs/2501.01763v1
-
-**Abstract:**
-> Following an analysis of existing AI-related exchange-traded funds (ETFs), we
-> reveal the selection criteria for determining which stocks qualify as
-> AI-related are often opaque and rely on vague phrases and subjective judgments.
-> This paper proposes a new, objective, data-driven approach using natural
-> language processing (NLP) techniques to classify AI stocks by analyzing annual
-> 10-K filings from 3,395 NASDAQ-listed firms between 2011 and 2023. This
-> analysis quantifies each company's engagement with AI through binary indicators
-> and weighted AI scores based on the frequency and context of AI-related terms.
-> Using these metrics, we construct four AI stock indices-the Equally Weighted AI
-> Index (AII), the Size-Weighted AI Index (SAII), and two Time-Discounted AI
-> Indices (TAII05 and TAII5X)-offering different perspectives on AI investment.
-> We validate our methodology through an event study on the launch of OpenAI's
-> ChatGPT, demonstrating that companies with higher AI engagement saw
-> significantly greater positive abnormal returns, with analyses supporting the
-> predictive power of our AI measures. Our indices perform on par with or surpass
-> 14 existing AI-themed ETFs and the Nasdaq Composite Index in risk-return
-> profiles, market responsiveness, and overall performance, achieving higher
-> average daily returns and risk-adjusted metrics without increased volatility.
-> These results suggest our NLP-based approach offers a reliable,
-> market-responsive, and cost-effective alternative to existing AI-related ETF
-> products. Our innovative methodology can also guide investors, asset managers,
-> and policymakers in using corporate data to construct other thematic
-> portfolios, contributing to a more transparent, data-driven, and competitive
-> approach.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on constructing AI stock indices using NLP techniques to analyze corporate filings, not on prompt engineering for Large Language Models (LLMs). It does not investigate, analyze, or propose methods for improving LLM performance through prompt manipulation, nor does it provide concrete examples of prompts and their impact on LLM output."
-}
-
----
-
-## [KANS: Knowledge Discovery Graph Attention Network for Soft Sensing in
-  Multivariate Industrial Processes](https://arxiv.org/abs/http://arxiv.org/abs/2501.02015v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02015v1
-
-**Abstract:**
-> Soft sensing of hard-to-measure variables is often crucial in industrial
-> processes. Current practices rely heavily on conventional modeling techniques
-> that show success in improving accuracy. However, they overlook the non-linear
-> nature, dynamics characteristics, and non-Euclidean dependencies between
-> complex process variables. To tackle these challenges, we present a framework
-> known as a Knowledge discovery graph Attention Network for effective Soft
-> sensing (KANS). Unlike the existing deep learning soft sensor models, KANS can
-> discover the intrinsic correlations and irregular relationships between the
-> multivariate industrial processes without a predefined topology. First, an
-> unsupervised graph structure learning method is introduced, incorporating the
-> cosine similarity between different sensor embedding to capture the
-> correlations between sensors. Next, we present a graph attention-based
-> representation learning that can compute the multivariate data parallelly to
-> enhance the model in learning complex sensor nodes and edges. To fully explore
-> KANS, knowledge discovery analysis has also been conducted to demonstrate the
-> interpretability of the model. Experimental results demonstrate that KANS
-> significantly outperforms all the baselines and state-of-the-art methods in
-> soft sensing performance. Furthermore, the analysis shows that KANS can find
-> sensors closely related to different process variables without domain
-> knowledge, significantly improving soft sensing accuracy.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing a Knowledge Discovery Graph Attention Network (KANS) for soft sensing in industrial processes, not on prompt engineering for Large Language Models (LLMs). It does not investigate, analyze, or propose methods for improving LLM performance through textual input prompt manipulation, nor does it provide examples of prompts and their impact on LLM output.
-
----
-
-## [Benchmark Evaluations, Applications, and Challenges of Large Vision
-  Language Models: A Survey](https://arxiv.org/abs/http://arxiv.org/abs/2501.02189v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02189v2
-
-**Abstract:**
-> Multimodal Vision Language Models (VLMs) have emerged as a transformative
-> technology at the intersection of computer vision and natural language
-> processing, enabling machines to perceive and reason about the world through
-> both visual and textual modalities. For example, models such as CLIP, Claude,
-> and GPT-4V demonstrate strong reasoning and understanding abilities on visual
-> and textual data and beat classical single modality vision models on zero-shot
-> classification. Despite their rapid advancements in research and growing
-> popularity in applications, a comprehensive survey of existing studies on VLMs
-> is notably lacking, particularly for researchers aiming to leverage VLMs in
-> their specific domains. To this end, we provide a systematic overview of VLMs
-> in the following aspects: model information of the major VLMs developed over
-> the past five years (2019-2024); the main architectures and training methods of
-> these VLMs; summary and categorization of the popular benchmarks and evaluation
-> metrics of VLMs; the applications of VLMs including embodied agents, robotics,
-> and video generation; the challenges and issues faced by current VLMs such as
-> hallucination, fairness, and safety. Detailed collections including papers and
-> model repository links are listed in
-> https://github.com/zli12321/Awesome-VLM-Papers-And-Models.git.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on Multimodal Vision Language Models (VLMs), their applications, and challenges, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) as required by the criteria.
-
----
-
-## [Optimizing Small Language Models for In-Vehicle Function-Calling](https://arxiv.org/abs/http://arxiv.org/abs/2501.02342v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02342v1
-
-**Abstract:**
-> We propose a holistic approach for deploying Small Language Models (SLMs) as
-> function-calling agents within vehicles as edge devices, offering a more
-> flexible and robust alternative to traditional rule-based systems. By
-> leveraging SLMs, we simplify vehicle control mechanisms and enhance the user
-> experience. Given the in-vehicle hardware constraints, we apply
-> state-of-the-art model compression techniques, including structured pruning,
-> healing, and quantization, ensuring that the model fits within the resource
-> limitations while maintaining acceptable performance. Our work focuses on
-> optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best
-> practices for enabling embedded models, including compression, task-specific
-> fine-tuning, and vehicle integration. We demonstrate that, despite significant
-> reduction in model size which removes up to 2 billion parameters from the
-> original model, our approach preserves the model's ability to handle complex
-> in-vehicle tasks accurately and efficiently. Furthermore, by executing the
-> model in a lightweight runtime environment, we achieve a generation speed of 11
-> tokens per second, making real-time, on-device inference feasible without
-> hardware acceleration. Our results demonstrate the potential of SLMs to
-> transform vehicle control systems, enabling more intuitive interactions between
-> users and their vehicles for an enhanced driving experience.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on optimizing Small Language Models (SLMs) for deployment in vehicles, discussing model compression and fine-tuning, rather than prompt engineering for Large Language Models (LLMs) and its impact on textual input and output.
-
----
-
-## [Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers](https://arxiv.org/abs/http://arxiv.org/abs/2501.02393v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02393v2
-
-**Abstract:**
-> We present an approach to modifying Transformer architectures by integrating
-> graph-aware relational reasoning into the attention mechanism, merging concepts
-> from graph neural networks and language modeling. Building on the inherent
-> connection between attention and graph theory, we reformulate the Transformer's
-> attention mechanism as a graph operation and propose Graph-Aware Isomorphic
-> Attention. This method leverages advanced graph modeling strategies, including
-> Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA),
-> to enrich the representation of relational structures. Our approach captures
-> complex dependencies and generalizes across tasks, as evidenced by a reduced
-> generalization gap and improved learning performance. Additionally, we expand
-> the concept of graph-aware attention to introduce Sparse GIN-Attention, a
-> fine-tuning approach that employs sparse GINs. By interpreting attention
-> matrices as sparse adjacency graphs, this technique enhances the adaptability
-> of pre-trained foundational models with minimal computational overhead,
-> endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning
-> achieves improved training dynamics and better generalization compared to
-> alternative methods like low-rank adaption (LoRA). We discuss latent graph-like
-> structures within traditional attention mechanisms, offering a new lens through
-> which Transformers can be understood. By evolving Transformers as hierarchical
-> GIN models for relational reasoning. This perspective suggests profound
-> implications for foundational model development, enabling the design of
-> architectures that dynamically adapt to both local and global dependencies.
-> Applications in bioinformatics, materials science, language modeling, and
-> beyond could benefit from this synthesis of relational and sequential data
-> modeling, setting the stage for interpretable and generalizable modeling
-> strategies.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on modifying Transformer architectures by integrating graph-aware relational reasoning, which aligns with developing new LLM architectures, violating the 'MUST NOT' criterion 1. It does not centrally address prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Toward Inclusive Educational AI: Auditing Frontier LLMs through a
-  Multiplexity Lens](https://arxiv.org/abs/http://arxiv.org/abs/2501.03259v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03259v1
-
-**Abstract:**
-> As large language models (LLMs) like GPT-4 and Llama 3 become integral to
-> educational contexts, concerns are mounting over the cultural biases, power
-> imbalances, and ethical limitations embedded within these technologies. Though
-> generative AI tools aim to enhance learning experiences, they often reflect
-> values rooted in Western, Educated, Industrialized, Rich, and Democratic
-> (WEIRD) cultural paradigms, potentially sidelining diverse global perspectives.
-> This paper proposes a framework to assess and mitigate cultural bias within
-> LLMs through the lens of applied multiplexity. Multiplexity, inspired by
-> Senturk et al. and rooted in Islamic and other wisdom traditions, emphasizes
-> the coexistence of diverse cultural viewpoints, supporting a multi-layered
-> epistemology that integrates both empirical sciences and normative values. Our
-> analysis reveals that LLMs frequently exhibit cultural polarization, with
-> biases appearing in both overt responses and subtle contextual cues. To address
-> inherent biases and incorporate multiplexity in LLMs, we propose two
-> strategies: \textit{Contextually-Implemented Multiplex LLMs}, which embed
-> multiplex principles directly into the system prompt, influencing LLM outputs
-> at a foundational level and independent of individual prompts, and
-> \textit{Multi-Agent System (MAS)-Implemented Multiplex LLMs}, where multiple
-> LLM agents, each representing distinct cultural viewpoints, collaboratively
-> generate a balanced, synthesized response. Our findings demonstrate that as
-> mitigation strategies evolve from contextual prompting to MAS-implementation,
-> cultural inclusivity markedly improves, evidenced by a significant rise in the
-> Perspectives Distribution Score (PDS) and a PDS Entropy increase from 3.25\% at
-> baseline to 98\% with the MAS-Implemented Multiplex LLMs. Sentiment analysis
-> further shows a shift towards positive sentiment across cultures,...
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on auditing and mitigating cultural biases in LLMs through a multiplexity lens, rather than specifically engineering or optimizing prompts for improving LLM performance. While it mentions prompt manipulation (Contextually-Implemented Multiplex LLMs), this is secondary to the overarching goal of bias mitigation and not the core subject of the paper."
-}
-
----
-
-## [LLM Content Moderation and User Satisfaction: Evidence from Response
-  Refusals in Chatbot Arena](https://arxiv.org/abs/http://arxiv.org/abs/2501.03266v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03266v1
-
-**Abstract:**
-> LLM safety and ethical alignment are widely discussed, but the impact of
-> content moderation on user satisfaction remains underexplored. To address this,
-> we analyze nearly 50,000 Chatbot Arena response-pairs using a novel fine-tuned
-> RoBERTa model, that we trained on hand-labeled data to disentangle refusals due
-> to ethical concerns from other refusals due to technical disabilities or lack
-> of information. Our findings reveal a significant refusal penalty on content
-> moderation, with users choosing ethical-based refusals roughly one-fourth as
-> often as their preferred LLM response compared to standard responses. However,
-> the context and phrasing play critical roles: refusals on highly sensitive
-> prompts, such as illegal content, achieve higher win rates than less sensitive
-> ethical concerns, and longer responses closely aligned with the prompt perform
-> better. These results emphasize the need for nuanced moderation strategies that
-> balance ethical safeguards with user satisfaction. Moreover, we find that the
-> refusal penalty is notably lower in evaluations using the LLM-as-a-Judge
-> method, highlighting discrepancies between user and automated assessments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on LLM content moderation, user satisfaction, and refining a RoBERTa model for analyzing refusals, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) and their direct impact on LLM output.
-
----
-
-## [Cyber Shadows: Neutralizing Security Threats with AI and Targeted Policy
-  Measures](https://arxiv.org/abs/http://arxiv.org/abs/2501.09025v1)
-**arXiv ID:** http://arxiv.org/abs/2501.09025v1
-
-**Abstract:**
-> The digital age, driven by the AI revolution, brings significant
-> opportunities but also conceals security threats, which we refer to as cyber
-> shadows. These threats pose risks at individual, organizational, and societal
-> levels. This paper examines the systemic impact of these cyber threats and
-> proposes a comprehensive cybersecurity strategy that integrates AI-driven
-> solutions, such as Intrusion Detection Systems (IDS), with targeted policy
-> interventions. By combining technological and regulatory measures, we create a
-> multilevel defense capable of addressing both direct threats and indirect
-> negative externalities. We emphasize that the synergy between AI-driven
-> solutions and policy interventions is essential for neutralizing cyber threats
-> and mitigating their negative impact on the digital economy. Finally, we
-> underscore the need for continuous adaptation of these strategies, especially
-> in response to the rapid advancement of autonomous AI-driven attacks, to ensure
-> the creation of secure and resilient digital ecosystems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on a comprehensive cybersecurity strategy using AI-driven solutions and policy interventions, with no clear emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria.
-
----
-
-## [Zero-Shot Statistical Tests for LLM-Generated Text Detection using
-  Finite Sample Concentration Inequalities](https://arxiv.org/abs/http://arxiv.org/abs/2501.02406v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02406v2
-
-**Abstract:**
-> Verifying the provenance of content is crucial to the function of many
-> organizations, e.g., educational institutions, social media platforms, firms,
-> etc. This problem is becoming increasingly difficult as text generated by Large
-> Language Models (LLMs) becomes almost indistinguishable from human-generated
-> content. In addition, many institutions utilize in-house LLMs and want to
-> ensure that external, non-sanctioned LLMs do not produce content within the
-> institution. In this paper, we answer the following question: Given a piece of
-> text, can we identify whether it was produced by LLM $A$ or $B$ (where $B$ can
-> be a human)? We model LLM-generated text as a sequential stochastic process
-> with complete dependence on history and design zero-shot statistical tests to
-> distinguish between (i) the text generated by two different sets of LLMs $A$
-> (in-house) and $B$ (non-sanctioned) and also (ii) LLM-generated and
-> human-generated texts. We prove that the type I and type II errors for our
-> tests decrease exponentially in the text length. In designing our tests, we
-> derive concentration inequalities on the difference between log-perplexity and
-> the average entropy of the string under $A$. Specifically, for a given string,
-> we demonstrate that if the string is generated by $A$, the log-perplexity of
-> the string under $A$ converges to the average entropy of the string under $A$,
-> except with an exponentially small probability in string length. We also show
-> that if $B$ generates the text, except with an exponentially small probability
-> in string length, the log-perplexity of the string under $A$ converges to the
-> average cross-entropy of $B$ and $A$. Lastly, we present preliminary
-> experimental results to support our theoretical results. By enabling guaranteed
-> (with high probability) finding of the origin of harmful LLM-generated text
-> with arbitrary size, we can help combat misinformation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on statistical tests for detecting the origin of LLM-generated text, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models. It does not investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
diff --git a/prompt-papers.md b/prompt-papers.md
deleted file mode 100644
index 1e46ca5..0000000
--- a/prompt-papers.md
+++ /dev/null
@@ -1,5903 +0,0 @@
-# Accepted Papers
-
-## [LLMPC: Large Language Model Predictive Control](https://arxiv.org/abs/http://arxiv.org/abs/2501.02486v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02486v1
-
-**Abstract:**
-> Recent advancements in prompting techniques for Large Language Models (LLMs)
-> have improved their reasoning, planning, and action abilities. This paper
-> examines these prompting techniques through the lens of model predictive
-> control (MPC). We show that LLMs act as implicit planning cost function
-> minimizers when planning prompts are used. Under our framework we demonstrate
-> that LLM planning performance can be improved further by incorporating real
-> planning cost functions and evaluators.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "ACCEPT",
-    "explanation": "The paper focuses primarily on optimizing LLM performance through the manipulation of textual input prompts (planning prompts), provides a framework (LLMPC) for systematic prompt development, and demonstrates its impact on LLM output, aligning with the core subject of prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-# Rejected Papers
-
-## [Co-Activation Graph Analysis of Safety-Verified and Explainable Deep
-  Reinforcement Learning Policies](https://arxiv.org/abs/http://arxiv.org/abs/2501.03142v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03142v1
-
-**Abstract:**
-> Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors
-> and are challenging to interpret. To address these challenges, we combine RL
-> policy model checking--a technique for determining whether RL policies exhibit
-> unsafe behaviors--with co-activation graph analysis--a method that maps neural
-> network inner workings by analyzing neuron activation patterns--to gain insight
-> into the safe RL policy's sequential decision-making. This combination lets us
-> interpret the RL policy's inner workings for safe decision-making. We
-> demonstrate its applicability in various experiments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on deep reinforcement learning (RL) policies, model checking, and co-activation graph analysis, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Turn-based Multi-Agent Reinforcement Learning Model Checking](https://arxiv.org/abs/http://arxiv.org/abs/2501.03187v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03187v1
-
-**Abstract:**
-> In this paper, we propose a novel approach for verifying the compliance of
-> turn-based multi-agent reinforcement learning (TMARL) agents with complex
-> requirements in stochastic multiplayer games. Our method overcomes the
-> limitations of existing verification approaches, which are inadequate for
-> dealing with TMARL agents and not scalable to large games with multiple agents.
-> Our approach relies on tight integration of TMARL and a verification technique
-> referred to as model checking. We demonstrate the effectiveness and scalability
-> of our technique through experiments in different types of environments. Our
-> experiments show that our method is suited to verify TMARL agents and scales
-> better than naive monolithic model checking.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on verifying compliance of turn-based multi-agent reinforcement learning agents, with no clear emphasis on prompt engineering for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria of focusing on engineering, design, or optimization of prompts specifically for LLMs.
-
----
-
-## [Neural Deconstruction Search for Vehicle Routing Problems](https://arxiv.org/abs/http://arxiv.org/abs/2501.03715v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03715v1
-
-**Abstract:**
-> Autoregressive construction approaches generate solutions to vehicle routing
-> problems in a step-by-step fashion, leading to high-quality solutions that are
-> nearing the performance achieved by handcrafted, operations research
-> techniques. In this work, we challenge the conventional paradigm of sequential
-> solution construction and introduce an iterative search framework where
-> solutions are instead deconstructed by a neural policy. Throughout the search,
-> the neural policy collaborates with a simple greedy insertion algorithm to
-> rebuild the deconstructed solutions. Our approach surpasses the performance of
-> state-of-the-art operations research methods across three challenging vehicle
-> routing problems of various problem sizes.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on solving Vehicle Routing Problems using a neural search framework, which falls under the excluded categories (primarily concerned with applications other than text generation driven by LLMs, and specifically mentions automotive/self-driving/operations research subjects). It also does not meet the 'MUST' criteria of focusing primarily on the engineering, design, or optimization of prompts for Large Language Models (LLMs).
-
----
-
-## [A completely uniform transformer for parity](https://arxiv.org/abs/http://arxiv.org/abs/2501.02535v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02535v1
-
-**Abstract:**
-> We construct a 3-layer constant-dimension transformer, recognizing the parity
-> language, where neither parameter matrices nor the positional encoding depend
-> on the input length. This improves upon a construction of Chiang and Cholak who
-> use a positional encoding, depending on the input length (but their
-> construction has 2 layers).
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on constructing a uniform transformer for recognizing parity language, which aligns with developing new LLM architectures or training methods, violating the 'MUST NOT' criteria 1. Additionally, it does not demonstrate prompt engineering for text-based interactions with LLMs as its core subject.
-
----
-
-## [Test-time Computing: from System-1 Thinking to System-2 Thinking](https://arxiv.org/abs/http://arxiv.org/abs/2501.02497v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02497v1
-
-**Abstract:**
-> The remarkable performance of the o1 model in complex reasoning demonstrates
-> that test-time computing scaling can further unlock the model's potential,
-> enabling powerful System-2 thinking. However, there is still a lack of
-> comprehensive surveys for test-time computing scaling. We trace the concept of
-> test-time computing back to System-1 models. In System-1 models, test-time
-> computing addresses distribution shifts and improves robustness and
-> generalization through parameter updating, input modification, representation
-> editing, and output calibration. In System-2 models, it enhances the model's
-> reasoning ability to solve complex problems through repeated sampling,
-> self-correction, and tree search. We organize this survey according to the
-> trend of System-1 to System-2 thinking, highlighting the key role of test-time
-> computing in the transition from System-1 models to weak System-2 models, and
-> then to strong System-2 models. We also point out a few possible future
-> directions.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the primary focus criteria as it discusses test-time computing scaling for enhancing model performance (pertaining to System-1 and System-2 thinking) without specifically addressing prompt engineering for Large Language Models (LLMs) or demonstrating the impact of textual input prompts on LLM output."
-}
-
----
-
-## [KG-CF: Knowledge Graph Completion with Context Filtering under the
-  Guidance of Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02711v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02711v1
-
-**Abstract:**
-> Large Language Models (LLMs) have shown impressive performance in various
-> tasks, including knowledge graph completion (KGC). However, current studies
-> mostly apply LLMs to classification tasks, like identifying missing triplets,
-> rather than ranking-based tasks, where the model ranks candidate entities based
-> on plausibility. This focus limits the practical use of LLMs in KGC, as
-> real-world applications prioritize highly plausible triplets. Additionally,
-> while graph paths can help infer the existence of missing triplets and improve
-> completion accuracy, they often contain redundant information. To address these
-> issues, we propose KG-CF, a framework tailored for ranking-based KGC tasks.
-> KG-CF leverages LLMs' reasoning abilities to filter out irrelevant contexts,
-> achieving superior results on real-world datasets. The code and datasets are
-> available at \url{https://anonymous.4open.science/r/KG-CF}.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on leveraging LLMs for knowledge graph completion (ranking-based tasks) rather than on the engineering, design, or optimization of prompts specifically for LLMs. While LLMs are used, the central concern is the application (KGC) and not prompt engineering techniques, methods, or their impact on LLM output."
-}
-
----
-
-## [Artificial Intelligence in Creative Industries: Advances Prior to 2025](https://arxiv.org/abs/http://arxiv.org/abs/2501.02725v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02725v1
-
-**Abstract:**
-> The rapid advancements in artificial intelligence (AI), particularly in
-> generative AI and large language models (LLMs), have profoundly impacted the
-> creative industries by enabling innovative content creation, enhancing
-> workflows, and democratizing access to creative tools. This paper explores the
-> significant technological shifts since our previous review in 2022,
-> highlighting how these developments have expanded creative opportunities and
-> efficiency. These technological advancements have enhanced the capabilities of
-> text-to-image, text-to-video, and multimodal generation technologies. In
-> particular, key breakthroughs in LLMs have established new benchmarks in
-> conversational AI, while advancements in image generators have revolutionized
-> content creation. We also discuss AI integration into post-production
-> workflows, which has significantly accelerated and refined traditional
-> processes. Despite these innovations, challenges remain, particularly for the
-> media industry, due to the demands on communication traffic from creative
-> content. We therefore include data compression and quality assessment in this
-> paper. Furthermore, we highlight the trend toward unified AI frameworks capable
-> of addressing multiple creative tasks and underscore the importance of human
-> oversight to mitigate AI-generated inaccuracies. Finally, we explore AI's
-> future potential in the creative sector, stressing the need to navigate
-> emerging challenges to maximize its benefits while addressing associated risks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on the broader impact of AI in creative industries, mentioning LLMs and generative AI as components, rather than specifically concentrating on prompt engineering for LLMs as required by the criteria.
-
----
-
-## [Multi-Agent Path Finding under Limited Communication Range Constraint
-  via Dynamic Leading](https://arxiv.org/abs/http://arxiv.org/abs/2501.02770v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02770v1
-
-**Abstract:**
-> This paper proposes a novel framework to handle a multi-agent path finding
-> problem under a limited communication range constraint, where all agents must
-> have a connected communication channel to the rest of the team. Many existing
-> approaches to multi-agent path finding (e.g., leader-follower platooning)
-> overcome computational challenges of planning in this domain by planning one
-> agent at a time in a fixed order. However, fixed leader-follower approaches can
-> become stuck during planning, limiting their practical utility in dense-clutter
-> environments. To overcome this limitation, we develop dynamic leading
-> multi-agent path finding, which allows for dynamic reselection of the leading
-> agent during path planning whenever progress cannot be made. The experiments
-> show the efficiency of our framework, which can handle up to 25 agents with
-> more than 90% success-rate across five environment types where baselines
-> routinely fail.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on multi-agent path finding under limited communication constraints, utilizing dynamic leading, and does not meet any of the 'MUST' criteria, particularly lacking primary focus on prompt engineering for Large Language Models (LLMs) and manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [SenseRAG: Constructing Environmental Knowledge Bases with Proactive
-  Querying for LLM-Based Autonomous Driving](https://arxiv.org/abs/http://arxiv.org/abs/2501.03535v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03535v2
-
-**Abstract:**
-> This study addresses the critical need for enhanced situational awareness in
-> autonomous driving (AD) by leveraging the contextual reasoning capabilities of
-> large language models (LLMs). Unlike traditional perception systems that rely
-> on rigid, label-based annotations, it integrates real-time, multimodal sensor
-> data into a unified, LLMs-readable knowledge base, enabling LLMs to dynamically
-> understand and respond to complex driving environments. To overcome the
-> inherent latency and modality limitations of LLMs, a proactive
-> Retrieval-Augmented Generation (RAG) is designed for AD, combined with a
-> chain-of-thought prompting mechanism, ensuring rapid and context-rich
-> understanding. Experimental results using real-world Vehicle-to-everything
-> (V2X) datasets demonstrate significant improvements in perception and
-> prediction performance, highlighting the potential of this framework to enhance
-> safety, adaptability, and decision-making in next-generation AD systems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing autonomous driving (AD) systems with LLMs, rather than prompt engineering for text-based interactions with LLMs. Although it mentions a 'chain-of-thought prompting mechanism', the core subject is the AD system's perception and prediction performance, not novel prompt engineering techniques for LLMs.
-
----
-
-## [STContext: A Multifaceted Dataset for Developing Context-aware
-  Spatio-temporal Crowd Mobility Prediction Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03583v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03583v1
-
-**Abstract:**
-> In smart cities, context-aware spatio-temporal crowd flow prediction (STCFP)
-> models leverage contextual features (e.g., weather) to identify unusual crowd
-> mobility patterns and enhance prediction accuracy. However, the best practice
-> for incorporating contextual features remains unclear due to inconsistent usage
-> of contextual features in different papers. Developing a multifaceted dataset
-> with rich types of contextual features and STCFP scenarios is crucial for
-> establishing a principled context modeling paradigm. Existing open crowd flow
-> datasets lack an adequate range of contextual features, which poses an urgent
-> requirement to build a multifaceted dataset to fill these research gaps. To
-> this end, we create STContext, a multifaceted dataset for developing
-> context-aware STCFP models. Specifically, STContext provides nine
-> spatio-temporal datasets across five STCFP scenarios and includes ten
-> contextual features, including weather, air quality index, holidays, points of
-> interest, road networks, etc. Besides, we propose a unified workflow for
-> incorporating contextual features into deep STCFP methods, with steps including
-> feature transformation, dependency modeling, representation fusion, and
-> training strategies. Through extensive experiments, we have obtained several
-> useful guidelines for effective context modeling and insights for future
-> research. The STContext is open-sourced at
-> https://github.com/Liyue-Chen/STContext.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it focuses on developing a multifaceted dataset for context-aware spatio-temporal crowd mobility prediction models, which falls outside the specified criteria.
-
----
-
-## [Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and
-  Editable Policies](https://arxiv.org/abs/http://arxiv.org/abs/2501.03888v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03888v1
-
-**Abstract:**
-> Although deep reinforcement learning has been shown to be effective, the
-> model's black-box nature presents barriers to direct policy interpretation. To
-> address this problem, we propose a neuro-symbolic approach called neural DNF-MT
-> for end-to-end policy learning. The differentiable nature of the neural DNF-MT
-> model enables the use of deep actor-critic algorithms for training. At the same
-> time, its architecture is designed so that trained models can be directly
-> translated into interpretable policies expressed as standard (bivalent or
-> probabilistic) logic programs. Moreover, additional layers can be included to
-> extract abstract features from complex observations, acting as a form of
-> predicate invention. The logic representations are highly interpretable, and we
-> show how the bivalent representations of deterministic policies can be edited
-> and incorporated back into a neural model, facilitating manual intervention and
-> adaptation of learned policies. We evaluate our approach on a range of tasks
-> requiring learning deterministic or stochastic behaviours from various forms of
-> observations. Our empirical results show that our neural DNF-MT model performs
-> at the level of competing black-box methods whilst providing interpretable
-> policies.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on developing a neuro-symbolic approach for interpretable policy learning in reinforcement learning, with no primary focus on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides](https://arxiv.org/abs/http://arxiv.org/abs/2501.03936v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03936v1
-
-**Abstract:**
-> Automatically generating presentations from documents is a challenging task
-> that requires balancing content quality, visual design, and structural
-> coherence. Existing methods primarily focus on improving and evaluating the
-> content quality in isolation, often overlooking visual design and structural
-> coherence, which limits their practical applicability. To address these
-> limitations, we propose PPTAgent, which comprehensively improves presentation
-> generation through a two-stage, edit-based approach inspired by human
-> workflows. PPTAgent first analyzes reference presentations to understand their
-> structural patterns and content schemas, then drafts outlines and generates
-> slides through code actions to ensure consistency and alignment. To
-> comprehensively evaluate the quality of generated presentations, we further
-> introduce PPTEval, an evaluation framework that assesses presentations across
-> three dimensions: Content, Design, and Coherence. Experiments show that
-> PPTAgent significantly outperforms traditional automatic presentation
-> generation methods across all three dimensions. The code and data are available
-> at https://github.com/icip-cas/PPTAgent.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on generating presentations (image/video generation) and improving content quality, visual design, and structural coherence, rather than specifically engineering or optimizing text-based input prompts for Large Language Models (LLMs)."
-}
-
----
-
-## [Implementing Systemic Thinking for Automatic Schema Matching: An
-  Agent-Based Modeling Approach](https://arxiv.org/abs/http://arxiv.org/abs/2501.04136v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04136v1
-
-**Abstract:**
-> Several approaches are proposed to deal with the problem of the Automatic
-> Schema Matching (ASM). The challenges and difficulties caused by the complexity
-> and uncertainty characterizing both the process and the outcome of Schema
-> Matching motivated us to investigate how bio-inspired emerging paradigm can
-> help with understanding, managing, and ultimately overcoming those challenges.
-> In this paper, we explain how we approached Automatic Schema Matching as a
-> systemic and Complex Adaptive System (CAS) and how we modeled it using the
-> approach of Agent-Based Modeling and Simulation (ABMS). This effort gives birth
-> to a tool (prototype) for schema matching called Reflex-SMAS. A set of
-> experiments demonstrates the viability of our approach on two main aspects: (i)
-> effectiveness (increasing the quality of the found matchings) and (ii)
-> efficiency (reducing the effort required for this efficiency). Our approach
-> represents a significant paradigm-shift, in the field of Automatic Schema
-> Matching.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary focus criteria, as it concentrates on Automatic Schema Matching using Agent-Based Modeling and Simulation, with no apparent emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [AI-Driven Reinvention of Hydrological Modeling for Accurate Predictions
-  and Interpretation to Transform Earth System Modeling](https://arxiv.org/abs/http://arxiv.org/abs/2501.04733v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04733v1
-
-**Abstract:**
-> Traditional equation-driven hydrological models often struggle to accurately
-> predict streamflow in challenging regional Earth systems like the Tibetan
-> Plateau, while hybrid and existing algorithm-driven models face difficulties in
-> interpreting hydrological behaviors. This work introduces HydroTrace, an
-> algorithm-driven, data-agnostic model that substantially outperforms these
-> approaches, achieving a Nash-Sutcliffe Efficiency of 98% and demonstrating
-> strong generalization on unseen data. Moreover, HydroTrace leverages advanced
-> attention mechanisms to capture spatial-temporal variations and
-> feature-specific impacts, enabling the quantification and spatial resolution of
-> streamflow partitioning as well as the interpretation of hydrological behaviors
-> such as glacier-snow-streamflow interactions and monsoon dynamics.
-> Additionally, a large language model (LLM)-based application allows users to
-> easily understand and apply HydroTrace's insights for practical purposes. These
-> advancements position HydroTrace as a transformative tool in hydrological and
-> broader Earth system modeling, offering enhanced prediction accuracy and
-> interpretability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing a new hydrological modeling approach (HydroTrace) and its application in Earth system modeling, with the Large Language Model (LLM) being a secondary component for interpreting insights, rather than the primary subject being prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Rethinking IDE Customization for Enhanced HAX: A Hyperdimensional
-  Perspective](https://arxiv.org/abs/http://arxiv.org/abs/2501.02491v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02491v1
-
-**Abstract:**
-> As Integrated Development Environments (IDEs) increasingly integrate
-> Artificial Intelligence, Software Engineering faces both benefits like
-> productivity gains and challenges like mismatched user preferences. We propose
-> Hyper-Dimensional (HD) vector spaces to model Human-Computer Interaction,
-> focusing on user actions, stylistic preferences, and project context. These
-> contributions aim to inspire further research on applying HD computing in IDE
-> design.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on IDE customization using Hyperdimensional vector spaces for Human-Computer Interaction, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [TreeMatch: A Fully Unsupervised WSD System Using Dependency Knowledge on
-  a Specific Domain](https://arxiv.org/abs/http://arxiv.org/abs/2501.02546v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02546v1
-
-**Abstract:**
-> Word sense disambiguation (WSD) is one of the main challenges in
-> Computational Linguistics. TreeMatch is a WSD system originally developed using
-> data from SemEval 2007 Task 7 (Coarse-grained English All-words Task) that has
-> been adapted for use in SemEval 2010 Task 17 (All-words Word Sense
-> Disambiguation on a Specific Domain). The system is based on a fully
-> unsupervised method using dependency knowledge drawn from a domain specific
-> knowledge base that was built for this task. When evaluated on the task, the
-> system precision performs above the Most Frequent Selection baseline.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on Word Sense Disambiguation (WSD) using dependency knowledge and a domain-specific knowledge base, with no indication of primarily investigating, analyzing, or proposing methods for improving Large Language Model (LLM) performance through the manipulation of textual input prompts.
-
----
-
-## [Efficient Architectures for High Resolution Vision-Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02584v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02584v1
-
-**Abstract:**
-> Vision-Language Models (VLMs) have recently experienced significant
-> advancements. However, challenges persist in the accurate recognition of fine
-> details within high resolution images, which limits performance in multiple
-> tasks. This work introduces Pheye, a novel architecture that efficiently
-> processes high-resolution images while training fewer parameters than similarly
-> sized VLMs. Notably, Pheye achieves a high efficiency while maintaining strong
-> performance, particularly in tasks that demand fine-grained image understanding
-> and/or the handling of scene-text.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a novel architecture (Pheye) for Vision-Language Models (VLMs) to process high-resolution images, which aligns with excluded criteria: developing new LLM architectures and being concerned with applications of generative AI other than text generation driven by LLMs (specifically, vision-language tasks).
-
----
-
-## [Enhancing Robot Route Optimization in Smart Logistics with Transformer
-  and GNN Integration](https://arxiv.org/abs/http://arxiv.org/abs/2501.02749v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02749v1
-
-**Abstract:**
-> This research delves into advanced route optimization for robots in smart
-> logistics, leveraging a fusion of Transformer architectures, Graph Neural
-> Networks (GNNs), and Generative Adversarial Networks (GANs). The approach
-> utilizes a graph-based representation encompassing geographical data, cargo
-> allocation, and robot dynamics, addressing both spatial and resource
-> limitations to refine route efficiency. Through extensive testing with
-> authentic logistics datasets, the proposed method achieves notable
-> improvements, including a 15% reduction in travel distance, a 20% boost in time
-> efficiency, and a 10% decrease in energy consumption. These findings highlight
-> the algorithm's effectiveness, promoting enhanced performance in intelligent
-> logistics operations.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the integration of Transformer, GNN, and GAN for robot route optimization in smart logistics, with no apparent emphasis on prompt engineering, manipulation of textual input prompts, or the interaction with Large Language Models (LLMs) for text generation.
-
----
-
-## [Key-value memory in the brain](https://arxiv.org/abs/http://arxiv.org/abs/2501.02950v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02950v1
-
-**Abstract:**
-> Classical models of memory in psychology and neuroscience rely on
-> similarity-based retrieval of stored patterns, where similarity is a function
-> of retrieval cues and the stored patterns. While parsimonious, these models do
-> not allow distinct representations for storage and retrieval, despite their
-> distinct computational demands. Key-value memory systems, in contrast,
-> distinguish representations used for storage (values) and those used for
-> retrieval (keys). This allows key-value memory systems to optimize
-> simultaneously for fidelity in storage and discriminability in retrieval. We
-> review the computational foundations of key-value memory, its role in modern
-> machine learning systems, related ideas from psychology and neuroscience,
-> applications to a number of empirical puzzles, and possible biological
-> implementations.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it does not focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate methods for improving LLM performance through prompt manipulation. Instead, it discusses key-value memory systems in the context of psychology, neuroscience, and machine learning, with no apparent connection to LLM prompt engineering.
-
----
-
-## [Putnam's Critical and Explanatory Tendencies Interpreted from a Machine
-  Learning Perspective](https://arxiv.org/abs/http://arxiv.org/abs/2501.03026v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03026v1
-
-**Abstract:**
-> Making sense of theory choice in normal and across extraordinary science is
-> central to philosophy of science. The emergence of machine learning models has
-> the potential to act as a wrench in the gears of current debates. In this
-> paper, I will attempt to reconstruct the main movements that lead to and came
-> out of Putnam's critical and explanatory tendency distinction, argue for the
-> biconditional necessity of the tendencies, and conceptualize that wrench
-> through a machine learning interpretation of my claim.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on interpreting philosophical concepts through a machine learning perspective, not specifically on the engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [GLiREL -- Generalist Model for Zero-Shot Relation Extraction](https://arxiv.org/abs/http://arxiv.org/abs/2501.03172v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03172v1
-
-**Abstract:**
-> We introduce GLiREL (Generalist Lightweight model for zero-shot Relation
-> Extraction), an efficient architecture and training paradigm for zero-shot
-> relation classification. Inspired by recent advancements in zero-shot named
-> entity recognition, this work presents an approach to efficiently and
-> accurately predict zero-shot relationship labels between multiple entities in a
-> single forward pass. Experiments using the FewRel and WikiZSL benchmarks
-> demonstrate that our approach achieves state-of-the-art results on the
-> zero-shot relation classification task. In addition, we contribute a protocol
-> for synthetically-generating datasets with diverse relation labels.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a new efficient architecture and training paradigm for zero-shot relation classification, rather than engineering, designing, or optimizing prompts for Large Language Models (LLMs) to improve their text generation performance.
-
----
-
-## [Can LLMs Design Good Questions Based on Context?](https://arxiv.org/abs/http://arxiv.org/abs/2501.03491v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03491v1
-
-**Abstract:**
-> This paper evaluates questions generated by LLMs from context, comparing them
-> to human-generated questions across six dimensions. We introduce an automated
-> LLM-based evaluation method, focusing on aspects like question length, type,
-> context coverage, and answerability. Our findings highlight unique
-> characteristics of LLM-generated questions, contributing insights that can
-> support further research in question quality and downstream applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on LLMs generating questions based on context, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) to improve their text generation performance through textual input manipulation.
-
----
-
-## [Self-Adaptive ERP: Embedding NLP into Petri-Net creation and Model
-  Matching](https://arxiv.org/abs/http://arxiv.org/abs/2501.03795v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03795v1
-
-**Abstract:**
-> Enterprise Resource Planning (ERP) consultants play a vital role in
-> customizing systems to meet specific business needs by processing large amounts
-> of data and adapting functionalities. However, the process is
-> resource-intensive, time-consuming, and requires continuous adjustments as
-> business demands evolve. This research introduces a Self-Adaptive ERP Framework
-> that automates customization using enterprise process models and system usage
-> analysis. It leverages Artificial Intelligence (AI) & Natural Language
-> Processing (NLP) for Petri nets to transform business processes into adaptable
-> models, addressing both structural and functional matching. The framework,
-> built using Design Science Research (DSR) and a Systematic Literature Review
-> (SLR), reduces reliance on manual adjustments, improving ERP customization
-> efficiency and accuracy while minimizing the need for consultants.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on automating ERP customization using AI and NLP for Petri-net creation and model matching, rather than specifically engineering, designing, or optimizing prompts for Large Language Models (LLMs). The mention of NLP is not centered around prompt engineering for LLMs, but rather for transforming business processes into adaptable models within an ERP framework.
-
----
-
-## [Synthetic Data Privacy Metrics](https://arxiv.org/abs/http://arxiv.org/abs/2501.03941v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03941v1
-
-**Abstract:**
-> Recent advancements in generative AI have made it possible to create
-> synthetic datasets that can be as accurate as real-world data for training AI
-> models, powering statistical insights, and fostering collaboration with
-> sensitive datasets while offering strong privacy guarantees. Effectively
-> measuring the empirical privacy of synthetic data is an important step in the
-> process. However, while there is a multitude of new privacy metrics being
-> published every day, there currently is no standardization. In this paper, we
-> review the pros and cons of popular metrics that include simulations of
-> adversarial attacks. We also review current best practices for amending
-> generative models to enhance the privacy of the data they create (e.g.
-> differential privacy).
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on synthetic data privacy metrics, generative AI for dataset creation, and differential privacy, without addressing prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Traits of a Leader: User Influence Level Prediction through
-  Sociolinguistic Modeling](https://arxiv.org/abs/http://arxiv.org/abs/2501.04046v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04046v1
-
-**Abstract:**
-> Recognition of a user's influence level has attracted much attention as human
-> interactions move online. Influential users have the ability to sway others'
-> opinions to achieve some goals. As a result, predicting users' level of
-> influence can help to understand social networks, forecast trends, prevent
-> misinformation, etc. However, predicting user influence is a challenging
-> problem because the concept of influence is specific to a situation or a
-> domain, and user communications are limited to text. In this work, we define
-> user influence level as a function of community endorsement and develop a model
-> that significantly outperforms the baseline by leveraging demographic and
-> personality data. This approach consistently improves RankDCG scores across
-> eight different domains.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on sociolinguistic modeling for predicting user influence levels in online interactions, without primarily addressing the engineering, design, or optimization of prompts for Large Language Models (LLMs) or demonstrating the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [IntegrityAI at GenAI Detection Task 2: Detecting Machine-Generated
-  Academic Essays in English and Arabic Using ELECTRA and Stylometry](https://arxiv.org/abs/http://arxiv.org/abs/2501.05476v1)
-**arXiv ID:** http://arxiv.org/abs/2501.05476v1
-
-**Abstract:**
-> Recent research has investigated the problem of detecting machine-generated
-> essays for academic purposes. To address this challenge, this research utilizes
-> pre-trained, transformer-based models fine-tuned on Arabic and English academic
-> essays with stylometric features. Custom models based on ELECTRA for English
-> and AraELECTRA for Arabic were trained and evaluated using a benchmark dataset.
-> Proposed models achieved excellent results with an F1-score of 99.7%, ranking
-> 2nd among of 26 teams in the English subtask, and 98.4%, finishing 1st out of
-> 23 teams in the Arabic one.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on developing and fine-tuning transformer-based models (ELECTRA, AraELECTRA) for detecting machine-generated essays, rather than engineering or optimizing prompts for Large Language Models (LLMs). Prompt engineering is not the central concern of this research."
-}
-
----
-
-## [Fairness Through Matching](https://arxiv.org/abs/http://arxiv.org/abs/2501.02793v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02793v1
-
-**Abstract:**
-> Group fairness requires that different protected groups, characterized by a
-> given sensitive attribute, receive equal outcomes overall. Typically, the level
-> of group fairness is measured by the statistical gap between predictions from
-> different protected groups. In this study, we reveal an implicit property of
-> existing group fairness measures, which provides an insight into how the
-> group-fair models behave. Then, we develop a new group-fair constraint based on
-> this implicit property to learn group-fair models. To do so, we first introduce
-> a notable theoretical observation: every group-fair model has an implicitly
-> corresponding transport map between the input spaces of each protected group.
-> Based on this observation, we introduce a new group fairness measure termed
-> Matched Demographic Parity (MDP), which quantifies the averaged gap between
-> predictions of two individuals (from different protected groups) matched by a
-> given transport map. Then, we prove that any transport map can be used in MDP
-> to learn group-fair models, and develop a novel algorithm called Fairness
-> Through Matching (FTM), which learns a group-fair model using MDP constraint
-> with an user-specified transport map. We specifically propose two favorable
-> types of transport maps for MDP, based on the optimal transport theory, and
-> discuss their advantages. Experiments reveal that FTM successfully trains
-> group-fair models with certain desirable properties by choosing the transport
-> map accordingly.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs), instead concentrating on developing a new group-fair constraint and algorithm for learnings group-fair models, with no mention of LLMs, prompt design, or textual input manipulation.
-
----
-
-## [CALM: Curiosity-Driven Auditing for Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02997v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02997v1
-
-**Abstract:**
-> Auditing Large Language Models (LLMs) is a crucial and challenging task. In
-> this study, we focus on auditing black-box LLMs without access to their
-> parameters, only to the provided service. We treat this type of auditing as a
-> black-box optimization problem where the goal is to automatically uncover
-> input-output pairs of the target LLMs that exhibit illegal, immoral, or unsafe
-> behaviors. For instance, we may seek a non-toxic input that the target LLM
-> responds to with a toxic output or an input that induces the hallucinative
-> response from the target LLM containing politically sensitive individuals. This
-> black-box optimization is challenging due to the scarcity of feasible points,
-> the discrete nature of the prompt space, and the large search space. To address
-> these challenges, we propose Curiosity-Driven Auditing for Large Language
-> Models (CALM), which uses intrinsically motivated reinforcement learning to
-> finetune an LLM as the auditor agent to uncover potential harmful and biased
-> input-output pairs of the target LLM. CALM successfully identifies derogatory
-> completions involving celebrities and uncovers inputs that elicit specific
-> names under the black-box setting. This work offers a promising direction for
-> auditing black-box LLMs. Our code is available at
-> https://github.com/x-zheng16/CALM.git.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on auditing Large Language Models (LLMs) using a reinforcement learning approach, rather than on the engineering, design, or optimization of prompts for improving LLM performance through textual input manipulation.
-
----
-
-## [Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering
-  alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.03012v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03012v1
-
-**Abstract:**
-> Multimodal LLMs have reached remarkable levels of proficiency in
-> understanding multimodal inputs, driving extensive research to develop
-> increasingly powerful models. However, much less attention has been paid to
-> understanding and explaining the underlying mechanisms of these models. Most
-> existing explainability research examines these models only in their final
-> states, overlooking the dynamic representational shifts that occur during
-> training. In this work, we systematically analyze the evolution of hidden state
-> representations to reveal how fine-tuning alters the internal structure of a
-> model to specialize in new multimodal tasks. Using a concept-based approach, we
-> map hidden states to interpretable visual and textual concepts, enabling us to
-> trace changes in encoded concepts across modalities as training progresses. We
-> also demonstrate the use of shift vectors to capture these concepts changes.
-> These shift vectors allow us to recover fine-tuned concepts by shifting those
-> in the original model. Finally, we explore the practical impact of our findings
-> on model steering, showing that we can adjust multimodal LLMs behaviors without
-> any training, such as modifying answer types, captions style, or biasing the
-> model toward specific responses. Our work sheds light on how multimodal
-> representations evolve through fine-tuning and offers a new perspective for
-> interpreting model adaptation in multimodal tasks. The code for this project is
-> publicly available at https://github.com/mshukor/xl-vlms.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on analyzing and understanding the internal representation shifts of multimodal LLMs during fine-tuning, rather than engineering, designing, or optimizing prompts specifically for LLMs, thus failing to meet the core 'MUST' criteria."
-}
-
----
-
-## [Large language models for artificial general intelligence (AGI): A
-  survey of foundational principles and approaches](https://arxiv.org/abs/http://arxiv.org/abs/2501.03151v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03151v1
-
-**Abstract:**
-> Generative artificial intelligence (AI) systems based on large-scale
-> pretrained foundation models (PFMs) such as vision-language models, large
-> language models (LLMs), diffusion models and vision-language-action (VLA)
-> models have demonstrated the ability to solve complex and truly non-trivial AI
-> problems in a wide variety of domains and contexts. Multimodal large language
-> models (MLLMs), in particular, learn from vast and diverse data sources,
-> allowing rich and nuanced representations of the world and, thereby, providing
-> extensive capabilities, including the ability to reason, engage in meaningful
-> dialog; collaborate with humans and other agents to jointly solve complex
-> problems; and understand social and emotional aspects of humans. Despite this
-> impressive feat, the cognitive abilities of state-of-the-art LLMs trained on
-> large-scale datasets are still superficial and brittle. Consequently, generic
-> LLMs are severely limited in their generalist capabilities. A number of
-> foundational problems -- embodiment, symbol grounding, causality and memory --
-> are required to be addressed for LLMs to attain human-level general
-> intelligence. These concepts are more aligned with human cognition and provide
-> LLMs with inherent human-like cognitive properties that support the realization
-> of physically-plausible, semantically meaningful, flexible and more
-> generalizable knowledge and intelligence. In this work, we discuss the
-> aforementioned foundational issues and survey state-of-the art approaches for
-> implementing these concepts in LLMs. Specifically, we discuss how the
-> principles of embodiment, symbol grounding, causality and memory can be
-> leveraged toward the attainment of artificial general intelligence (AGI) in an
-> organic manner.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on foundational principles and approaches for achieving Artificial General Intelligence (AGI) with Large Language Models (LLMs), rather than specifically on the engineering, design, or optimization of prompts for LLMs. It lacks concrete examples of prompts and their impact on LLM output, which is a required criterion.
-
----
-
-## [From Aleatoric to Epistemic: Exploring Uncertainty Quantification
-  Techniques in Artificial Intelligence](https://arxiv.org/abs/http://arxiv.org/abs/2501.03282v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03282v1
-
-**Abstract:**
-> Uncertainty quantification (UQ) is a critical aspect of artificial
-> intelligence (AI) systems, particularly in high-risk domains such as
-> healthcare, autonomous systems, and financial technology, where decision-making
-> processes must account for uncertainty. This review explores the evolution of
-> uncertainty quantification techniques in AI, distinguishing between aleatoric
-> and epistemic uncertainties, and discusses the mathematical foundations and
-> methods used to quantify these uncertainties. We provide an overview of
-> advanced techniques, including probabilistic methods, ensemble learning,
-> sampling-based approaches, and generative models, while also highlighting
-> hybrid approaches that integrate domain-specific knowledge. Furthermore, we
-> examine the diverse applications of UQ across various fields, emphasizing its
-> impact on decision-making, predictive accuracy, and system robustness. The
-> review also addresses key challenges such as scalability, efficiency, and
-> integration with explainable AI, and outlines future directions for research in
-> this rapidly developing area. Through this comprehensive survey, we aim to
-> provide a deeper understanding of UQ's role in enhancing the reliability,
-> safety, and trustworthiness of AI systems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses primarily on uncertainty quantification techniques in AI across various high-risk domains, rather than specifically on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not provide concrete examples of prompts impacting LLM output.
-
----
-
-## [Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function
-  for Real-Time Strategy Tasks](https://arxiv.org/abs/http://arxiv.org/abs/2501.03824v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03824v1
-
-**Abstract:**
-> Effective evaluation of real-time strategy tasks requires adaptive mechanisms
-> to cope with dynamic and unpredictable environments. This study proposes a
-> method to improve evaluation functions for real-time responsiveness to
-> battle-field situation changes, utilizing an online reinforcement
-> learning-based dynam-ic weight adjustment mechanism within the real-time
-> strategy game. Building on traditional static evaluation functions, the method
-> employs gradient descent in online reinforcement learning to update weights
-> dynamically, incorporating weight decay techniques to ensure stability.
-> Additionally, the AdamW optimizer is integrated to adjust the learning rate and
-> decay rate of online reinforcement learning in real time, further reducing the
-> dependency on manual parameter tun-ing. Round-robin competition experiments
-> demonstrate that this method signifi-cantly enhances the application
-> effectiveness of the Lanchester combat model evaluation function, Simple
-> evaluation function, and Simple Sqrt evaluation function in planning algorithms
-> including IDABCD, IDRTMinimax, and Port-folio AI. The method achieves a notable
-> improvement in scores, with the en-hancement becoming more pronounced as the
-> map size increases. Furthermore, the increase in evaluation function
-> computation time induced by this method is kept below 6% for all evaluation
-> functions and planning algorithms. The pro-posed dynamic adaptive evaluation
-> function demonstrates a promising approach for real-time strategy task
-> evaluation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on improving evaluation functions for real-time strategy tasks using online reinforcement learning, with no emphasis on prompt engineering, design, or optimization for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria.
-
----
-
-## [Dolphin: Closed-loop Open-ended Auto-research through Thinking,
-  Practice, and Feedback](https://arxiv.org/abs/http://arxiv.org/abs/2501.03916v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03916v2
-
-**Abstract:**
-> The scientific research paradigm is undergoing a profound transformation
-> owing to the development of Artificial Intelligence (AI). Recent works
-> demonstrate that various AI-assisted research methods can largely improve
-> research efficiency by improving data analysis, accelerating computation, and
-> fostering novel idea generation. To further move towards the ultimate goal
-> (i.e., automatic scientific research), in this paper, we propose Dolphin, the
-> first closed-loop open-ended auto-research framework to further build the
-> entire process of human scientific research. Dolphin can generate research
-> ideas, perform experiments, and get feedback from experimental results to
-> generate higher-quality ideas. More specifically, Dolphin first generates novel
-> ideas based on relevant papers which are ranked by the topic and task
-> attributes. Then, the codes are automatically generated and debugged with the
-> exception-traceback-guided local code structure. Finally, Dolphin automatically
-> analyzes the results of each idea and feeds the results back to the next round
-> of idea generation. Experiments are conducted on the benchmark datasets of
-> different topics and results show that Dolphin can generate novel ideas
-> continuously and complete the experiment in a loop. We highlight that Dolphin
-> can automatically propose methods that are comparable to the state-of-the-art
-> in some tasks such as 2D image classification and 3D point classification.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing an auto-research framework using AI, with Large Language Models (LLMs) being only one of the potential tools used for generating research ideas, rather than the central focus on prompt engineering for text-based interactions with LLMs as required.
-
----
-
-## [Interpretable Neural ODEs for Gene Regulatory Network Discovery under
-  Perturbations](https://arxiv.org/abs/http://arxiv.org/abs/2501.02409v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02409v1
-
-**Abstract:**
-> Modern high-throughput biological datasets with thousands of perturbations
-> provide the opportunity for large-scale discovery of causal graphs that
-> represent the regulatory interactions between genes. Numerous methods have been
-> proposed to infer a directed acyclic graph (DAG) corresponding to the
-> underlying gene regulatory network (GRN) that captures causal gene
-> relationships. However, existing models have restrictive assumptions (e.g.
-> linearity, acyclicity), limited scalability, and/or fail to address the dynamic
-> nature of biological processes such as cellular differentiation. We propose
-> PerturbODE, a novel framework that incorporates biologically informative neural
-> ordinary differential equations (neural ODEs) to model cell state trajectories
-> under perturbations and derive the causal GRN from the neural ODE's parameters.
-> We demonstrate PerturbODE's efficacy in trajectory prediction and GRN inference
-> across simulated and real over-expression datasets.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet any of the 'MUST' criteria, as it focuses on gene regulatory network discovery using neural ODEs, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, falling outside the specified scope.
-
----
-
-## [A Statistical Hypothesis Testing Framework for Data Misappropriation
-  Detection in Large Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02441v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02441v1
-
-**Abstract:**
-> Large Language Models (LLMs) are rapidly gaining enormous popularity in
-> recent years. However, the training of LLMs has raised significant privacy and
-> legal concerns, particularly regarding the inclusion of copyrighted materials
-> in their training data without proper attribution or licensing, which falls
-> under the broader issue of data misappropriation. In this article, we focus on
-> a specific problem of data misappropriation detection, namely, to determine
-> whether a given LLM has incorporated data generated by another LLM. To address
-> this issue, we propose embedding watermarks into the copyrighted training data
-> and formulating the detection of data misappropriation as a hypothesis testing
-> problem. We develop a general statistical testing framework, construct a
-> pivotal statistic, determine the optimal rejection threshold, and explicitly
-> control the type I and type II errors. Furthermore, we establish the asymptotic
-> optimality properties of the proposed tests, and demonstrate its empirical
-> effectiveness through intensive numerical experiments.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on detecting data misappropriation in LLMs through statistical hypothesis testing, with no emphasis on prompt engineering, design, or optimization for improving LLM performance through textual input prompts."
-}
-
----
-
-## [RTLMarker: Protecting LLM-Generated RTL Copyright via a Hardware
-  Watermarking Framework](https://arxiv.org/abs/http://arxiv.org/abs/2501.02446v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02446v1
-
-**Abstract:**
-> Recent advances of large language models in the field of Verilog generation
-> have raised several ethical and security concerns, such as code copyright
-> protection and dissemination of malicious code. Researchers have employed
-> watermarking techniques to identify codes generated by large language models.
-> However, the existing watermarking works fail to protect RTL code copyright due
-> to the significant syntactic and semantic differences between RTL code and
-> software code in languages such as Python. This paper proposes a hardware
-> watermarking framework RTLMarker that embeds watermarks into RTL code and
-> deeper into the synthesized netlist. We propose a set of rule-based Verilog
-> code transformations , ensuring the watermarked RTL code's syntactic and
-> semantic correctness. In addition, we consider an inherent tradeoff between
-> watermark transparency and watermark effectiveness and jointly optimize them.
-> The results demonstrate RTLMarker's superiority over the baseline in RTL code
-> watermarking.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on a hardware watermarking framework (RTLMarker) for protecting RTL code copyright generated by LLMs, rather than exploring prompt engineering techniques for improving LLM performance through textual input manipulation."
-}
-
----
-
-## [Enhancing Contrastive Learning for Retinal Imaging via Adjusted
-  Augmentation Scales](https://arxiv.org/abs/http://arxiv.org/abs/2501.02451v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02451v1
-
-**Abstract:**
-> Contrastive learning, a prominent approach within self-supervised learning,
-> has demonstrated significant effectiveness in developing generalizable models
-> for various applications involving natural images. However, recent research
-> indicates that these successes do not necessarily extend to the medical imaging
-> domain. In this paper, we investigate the reasons for this suboptimal
-> performance and hypothesize that the dense distribution of medical images poses
-> challenges to the pretext tasks in contrastive learning, particularly in
-> constructing positive and negative pairs. We explore model performance under
-> different augmentation strategies and compare the results to those achieved
-> with strong augmentations. Our study includes six publicly available datasets
-> covering multiple clinically relevant tasks. We further assess the model's
-> generalizability through external evaluations. The model pre-trained with weak
-> augmentation outperforms those with strong augmentation, improving AUROC from
-> 0.838 to 0.848 and AUPR from 0.523 to 0.597 on MESSIDOR2, and showing similar
-> enhancements across other datasets. Our findings suggest that optimizing the
-> scale of augmentation is critical for enhancing the efficacy of contrastive
-> learning in medical imaging.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on enhancing contrastive learning for medical imaging (retinal imaging), which violates the 'MUST NOT' criteria: being primarily concerned with medical subjects and not focusing on prompt engineering for Large Language Models (LLMs) or text generation driven by LLMs."
-}
-
----
-
-## [Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment
-  of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine](https://arxiv.org/abs/http://arxiv.org/abs/2501.02471v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02471v1
-
-**Abstract:**
-> Large language models (LLMs) primarily trained on English texts, often face
-> biases and inaccuracies in Chinese contexts. Their limitations are pronounced
-> in fields like Traditional Chinese Medicine (TCM), where cultural and clinical
-> subtleties are vital, further hindered by a lack of domain-specific data, such
-> as rheumatoid arthritis (RA). To address these issues, this paper introduces
-> Hengqin-RA-v1, the first large language model specifically tailored for TCM
-> with a focus on diagnosing and treating RA. We also present HQ-GCM-RA-C1, a
-> comprehensive RA-specific dataset curated from ancient Chinese medical
-> literature, classical texts, and modern clinical studies. This dataset empowers
-> Hengqin-RA-v1 to deliver accurate and culturally informed responses,
-> effectively bridging the gaps left by general-purpose models. Extensive
-> experiments demonstrate that Hengqin-RA-v1 outperforms state-of-the-art models,
-> even surpassing the diagnostic accuracy of TCM practitioners in certain cases.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a new Large Language Model architecture (Hengqin-RA-v1) specifically tailored for Traditional Chinese Medicine, rather than prompt engineering for existing LLMs, and introduces a new dataset for training, which violate the 'MUST NOT' criteria 1.
-
----
-
-## [The Meta-Representation Hypothesis](https://arxiv.org/abs/http://arxiv.org/abs/2501.02481v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02481v1
-
-**Abstract:**
-> Humans rely on high-level meta-representations to engage in abstract
-> reasoning. In complex cognitive tasks, these meta-representations help
-> individuals abstract general rules from experience. However, constructing such
-> meta-representations from high-dimensional observations remains a longstanding
-> challenge for reinforcement learning agents. For instance, a well-trained agent
-> often fails to generalize to even minor variations of the same task, such as
-> changes in background color, while humans can easily handle. In this paper, we
-> build a bridge between meta-representation and generalization, showing that
-> generalization performance benefits from meta-representation learning. We also
-> hypothesize that deep mutual learning (DML) among agents can help them converge
-> to meta-representations. Empirical results provide support for our theory and
-> hypothesis. Overall, this work provides a new perspective on the generalization
-> of deep reinforcement learning.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it primarily focuses on reinforcement learning, meta-representation, and generalization in deep learning, with no clear emphasis on prompt engineering, design, or optimization specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts for improving LLM performance.
-
----
-
-## [Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment
-  Retrieval and Highlight Detection](https://arxiv.org/abs/http://arxiv.org/abs/2501.02504v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02504v1
-
-**Abstract:**
-> The goal of video moment retrieval and highlight detection is to identify
-> specific segments and highlights based on a given text query. With the rapid
-> growth of video content and the overlap between these tasks, recent works have
-> addressed both simultaneously. However, they still struggle to fully capture
-> the overall video context, making it challenging to determine which words are
-> most relevant. In this paper, we present a novel Video Context-aware Keyword
-> Attention module that overcomes this limitation by capturing keyword variation
-> within the context of the entire video. To achieve this, we introduce a video
-> context clustering module that provides concise representations of the overall
-> video context, thereby enhancing the understanding of keyword dynamics.
-> Furthermore, we propose a keyword weight detection module with keyword-aware
-> contrastive learning that incorporates keyword information to enhance
-> fine-grained alignment between visual and textual features. Extensive
-> experiments on the QVHighlights, TVSum, and Charades-STA benchmarks demonstrate
-> that our proposed method significantly improves performance in moment retrieval
-> and highlight detection tasks compared to existing approaches. Our code is
-> available at: https://github.com/VisualAIKHU/Keyword-DETR
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on video moment retrieval and highlight detection, utilizing a novel Video Context-aware Keyword Attention module, with no evident focus on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [Remote Inference over Dynamic Links via Adaptive Rate Deep Task-Oriented
-  Vector Quantization](https://arxiv.org/abs/http://arxiv.org/abs/2501.02521v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02521v1
-
-**Abstract:**
-> A broad range of technologies rely on remote inference, wherein data acquired
-> is conveyed over a communication channel for inference in a remote server.
-> Communication between the participating entities is often carried out over
-> rate-limited channels, necessitating data compression for reducing latency.
-> While deep learning facilitates joint design of the compression mapping along
-> with encoding and inference rules, existing learned compression mechanisms are
-> static, and struggle in adapting their resolution to changes in channel
-> conditions and to dynamic links. To address this, we propose Adaptive Rate
-> Task-Oriented Vector Quantization (ARTOVeQ), a learned compression mechanism
-> that is tailored for remote inference over dynamic links. ARTOVeQ is based on
-> designing nested codebooks along with a learning algorithm employing
-> progressive learning. We show that ARTOVeQ extends to support low-latency
-> inference that is gradually refined via successive refinement principles, and
-> that it enables the simultaneous usage of multiple resolutions when conveying
-> high-dimensional data. Numerical results demonstrate that the proposed scheme
-> yields remote deep inference that operates with multiple rates, supports a
-> broad range of bit budgets, and facilitates rapid inference that gradually
-> improves with more bits exchanged, while approaching the performance of
-> single-rate deep quantization methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on adaptive rate deep task-oriented vector quantization for remote inference over dynamic links, without any primary concern for prompt engineering, Large Language Models (LLMs), or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02523v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02523v1
-
-**Abstract:**
-> Facial images have extensive practical applications. Although the current
-> large-scale text-image diffusion models exhibit strong generation capabilities,
-> it is challenging to generate the desired facial images using only text prompt.
-> Image prompts are a logical choice. However, current methods of this type
-> generally focus on general domain. In this paper, we aim to optimize image
-> makeup techniques to generate the desired facial images. Specifically, (1) we
-> built a dataset of 4 million high-quality face image-text pairs
-> (FaceCaptionHQ-4M) based on LAION-Face to train our Face-MakeUp model; (2) to
-> maintain consistency with the reference facial image, we extract/learn
-> multi-scale content features and pose features for the facial image,
-> integrating these into the diffusion model to enhance the preservation of
-> facial identity features for diffusion models. Validation on two face-related
-> test datasets demonstrates that our Face-MakeUp can achieve the best
-> comprehensive performance.All codes are available
-> at:https://github.com/ddw2AIGROUP2CQUPT/Face-MakeUp
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on optimizing image makeup techniques for text-to-image generation, which falls under image generation driven by non-text-based interactions with generative models, violating the 'MUST NOT' criteria 2, and does not meet the core subject requirement of prompt engineering for text-based interactions with Large Language Models (LLMs)."
-}
-
----
-
-## [AMM: Adaptive Modularized Reinforcement Model for Multi-city Traffic
-  Signal Control](https://arxiv.org/abs/http://arxiv.org/abs/2501.02548v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02548v1
-
-**Abstract:**
-> Traffic signal control (TSC) is an important and widely studied direction.
-> Recently, reinforcement learning (RL) methods have been used to solve TSC
-> problems and achieve superior performance over conventional TSC methods.
-> However, applying RL methods to the real world is challenging due to the huge
-> cost of experiments in real-world traffic environments. One possible solution
-> is TSC domain adaptation, which adapts trained models to target environments
-> and reduces the number of interactions and the training cost. However, existing
-> TSC domain adaptation methods still face two major issues: the lack of
-> consideration for differences across cities and the low utilization of
-> multi-city data.
-> To solve aforementioned issues, we propose an approach named Adaptive
-> Modularized Model (AMM). By modularizing TSC problems and network models, we
-> overcome the challenge of possible changes in environmental observations. We
-> also aggregate multi-city experience through meta-learning. We conduct
-> extensive experiments on different cities and show that AMM can achieve
-> excellent performance with limited interactions in target environments and
-> outperform existing methods. We also demonstrate the feasibility and
-> generalizability of our method.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on Adaptive Modularized Reinforcement Model for Multi-city Traffic Signal Control, which does not meet the 'MUST' criteria of primarily focusing on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompt manipulation.
-
----
-
-## [KM-UNet KAN Mamba UNet for medical image segmentation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02559v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02559v1
-
-**Abstract:**
-> Medical image segmentation is a critical task in medical imaging analysis.
-> Traditional CNN-based methods struggle with modeling long-range dependencies,
-> while Transformer-based models, despite their success, suffer from quadratic
-> computational complexity. To address these limitations, we propose KM-UNet, a
-> novel U-shaped network architecture that combines the strengths of
-> Kolmogorov-Arnold Networks (KANs) and state-space models (SSMs). KM-UNet
-> leverages the Kolmogorov-Arnold representation theorem for efficient feature
-> representation and SSMs for scalable long-range modeling, achieving a balance
-> between accuracy and computational efficiency. We evaluate KM-UNet on five
-> benchmark datasets: ISIC17, ISIC18, CVC, BUSI, and GLAS. Experimental results
-> demonstrate that KM-UNet achieves competitive performance compared to
-> state-of-the-art methods in medical image segmentation tasks. To the best of
-> our knowledge, KM-UNet is the first medical image segmentation framework
-> integrating KANs and SSMs. This work provides a valuable baseline and new
-> insights for the development of more efficient and interpretable medical image
-> segmentation systems. The code is open source at
-> https://github.com/2760613195/KM_UNet
-> Keywords:KAN,Manba, state-space models,UNet, Medical image segmentation, Deep
-> learning
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on medical image segmentation using a novel U-shaped network architecture, which falls under excluded subjects (medical) and does not meet the 'MUST' criteria of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Decoding fMRI Data into Captions using Prefix Language Modeling](https://arxiv.org/abs/http://arxiv.org/abs/2501.02570v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02570v1
-
-**Abstract:**
-> With the advancements in Large Language and Latent Diffusion models, brain
-> decoding has achieved remarkable results in recent years. The works on the NSD
-> dataset, with stimuli images from the COCO dataset, leverage the embeddings
-> from the CLIP model for image reconstruction and GIT for captioning. However,
-> the current captioning approach introduces the challenge of potential data
-> contamination given that the GIT model was trained on the COCO dataset. In this
-> work, we present an alternative method for decoding brain signals into image
-> captions by predicting a DINOv2 model's embedding of an image from the
-> corresponding fMRI signal and then providing its [CLS] token as the prefix to
-> the GPT-2 language model which decreases computational requirements
-> considerably. Additionally, instead of commonly used Linear Regression, we
-> explore 3D Convolutional Neural Network mapping of fMRI signals to image
-> embedding space for better accounting positional information of voxels.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on decoding fMRI data into image captions using a combination of computer vision and language models, rather than specifically engineering or optimizing prompts for Large Language Models (LLMs). The use of a GPT-2 model with a predicted prefix does not constitute a primary focus on prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Evolving Skeletons: Motion Dynamics in Action Recognition](https://arxiv.org/abs/http://arxiv.org/abs/2501.02593v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02593v1
-
-**Abstract:**
-> Skeleton-based action recognition has gained significant attention for its
-> ability to efficiently represent spatiotemporal information in a lightweight
-> format. Most existing approaches use graph-based models to process skeleton
-> sequences, where each pose is represented as a skeletal graph structured around
-> human physical connectivity. Among these, the Spatiotemporal Graph
-> Convolutional Network (ST-GCN) has become a widely used framework.
-> Alternatively, hypergraph-based models, such as the Hyperformer, capture
-> higher-order correlations, offering a more expressive representation of complex
-> joint interactions. A recent advancement, termed Taylor Videos, introduces
-> motion-enhanced skeleton sequences by embedding motion concepts, providing a
-> fresh perspective on interpreting human actions in skeleton-based action
-> recognition. In this paper, we conduct a comprehensive evaluation of both
-> traditional skeleton sequences and Taylor-transformed skeletons using ST-GCN
-> and Hyperformer models on the NTU-60 and NTU-120 datasets. We compare skeletal
-> graph and hypergraph representations, analyzing static poses against
-> motion-injected poses. Our findings highlight the strengths and limitations of
-> Taylor-transformed skeletons, demonstrating their potential to enhance motion
-> dynamics while exposing current challenges in fully using their benefits. This
-> study underscores the need for innovative skeletal modelling techniques to
-> effectively handle motion-rich data and advance the field of action
-> recognition.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on skeleton-based action recognition, graph/hypergraph models (ST-GCN, Hyperformer), and motion dynamics analysis, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Empowering Bengali Education with AI: Solving Bengali Math Word Problems
-  through Transformer Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02599v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02599v1
-
-**Abstract:**
-> Mathematical word problems (MWPs) involve the task of converting textual
-> descriptions into mathematical equations. This poses a significant challenge in
-> natural language processing, particularly for low-resource languages such as
-> Bengali. This paper addresses this challenge by developing an innovative
-> approach to solving Bengali MWPs using transformer-based models, including
-> Basic Transformer, mT5, BanglaT5, and mBART50. To support this effort, the
-> "PatiGonit" dataset was introduced, containing 10,000 Bengali math problems,
-> and these models were fine-tuned to translate the word problems into equations
-> accurately. The evaluation revealed that the mT5 model achieved the highest
-> accuracy of 97.30%, demonstrating the effectiveness of transformer models in
-> this domain. This research marks a significant step forward in Bengali natural
-> language processing, offering valuable methodologies and resources for
-> educational AI tools. By improving math education, it also supports the
-> development of advanced problem-solving skills for Bengali-speaking students.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on developing an approach for solving Bengali math word problems using transformer-based models, including fine-tuning these models, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs) to improve their text generation performance."
-}
-
----
-
-## [LLMs Help Alleviate the Cross-Subject Variability in Brain Signal and
-  Language Alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.02621v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02621v1
-
-**Abstract:**
-> Decoding human activity from EEG signals has long been a popular research
-> topic. While recent studies have increasingly shifted focus from single-subject
-> to cross-subject analysis, few have explored the model's ability to perform
-> zero-shot predictions on EEG signals from previously unseen subjects. This
-> research aims to investigate whether deep learning methods can capture
-> subject-independent semantic information inherent in human EEG signals. Such
-> insights are crucial for Brain-Computer Interfaces (BCI) because, on one hand,
-> they demonstrate the model's robustness against subject-specific temporal
-> biases, and on the other, they significantly enhance the generalizability of
-> downstream tasks. We employ Large Language Models (LLMs) as denoising agents to
-> extract subject-independent semantic features from noisy EEG signals.
-> Experimental results, including ablation studies, highlight the pivotal role of
-> LLMs in decoding subject-independent semantic information from noisy EEG data.
-> We hope our findings will contribute to advancing BCI research and assist both
-> academia and industry in applying EEG signals to a broader range of
-> applications.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on using LLMs as denoising agents for decoding subject-independent semantic information from EEG signals in Brain-Computer Interfaces (BCI) research, rather than on prompt engineering for text-based interactions with LLMs, failing to meet the core subject criterion."
-}
-
----
-
-## [Trust and Dependability in Blockchain & AI Based MedIoT Applications:
-  Research Challenges and Future Directions](https://arxiv.org/abs/http://arxiv.org/abs/2501.02647v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02647v1
-
-**Abstract:**
-> This paper critically reviews the integration of Artificial Intelligence (AI)
-> and blockchain technologies in the context of Medical Internet of Things
-> (MedIoT) applications, where they collectively promise to revolutionize
-> healthcare delivery. By examining current research, we underscore AI's
-> potential in advancing diagnostics and patient care, alongside blockchain's
-> capacity to bolster data security and patient privacy. We focus particularly on
-> the imperative to cultivate trust and ensure reliability within these systems.
-> Our review highlights innovative solutions for managing healthcare data and
-> challenges such as ensuring scalability, maintaining privacy, and promoting
-> ethical practices within the MedIoT domain. We present a vision for integrating
-> AI-driven insights with blockchain security in healthcare, offering a
-> comprehensive review of current research and future directions. We conclude
-> with a set of identified research gaps and propose that addressing these is
-> crucial for achieving the dependable, secure, and patient -centric MedIoT
-> applications of tomorrow.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the integration of AI and blockchain for MedIoT applications, addressing trust, security, and healthcare, which falls under excluded subjects (medical) and does not meet the 'MUST' criteria for focusing on prompt engineering for Large Language Models (LLMs).
-
----
-
-## [Tighnari: Multi-modal Plant Species Prediction Based on Hierarchical
-  Cross-Attention Using Graph-Based and Vision Backbone-Extracted Features](https://arxiv.org/abs/http://arxiv.org/abs/2501.02649v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02649v1
-
-**Abstract:**
-> Predicting plant species composition in specific spatiotemporal contexts
-> plays an important role in biodiversity management and conservation, as well as
-> in improving species identification tools. Our work utilizes 88,987 plant
-> survey records conducted in specific spatiotemporal contexts across Europe. We
-> also use the corresponding satellite images, time series data, climate time
-> series, and other rasterized environmental data such as land cover, human
-> footprint, bioclimatic, and soil variables as training data to train the model
-> to predict the outcomes of 4,716 plant surveys. We propose a feature
-> construction and result correction method based on the graph structure. Through
-> comparative experiments, we select the best-performing backbone networks for
-> feature extraction in both temporal and image modalities. In this process, we
-> built a backbone network based on the Swin-Transformer Block for extracting
-> temporal Cubes features. We then design a hierarchical cross-attention
-> mechanism capable of robustly fusing features from multiple modalities. During
-> training, we adopt a 10-fold cross-fusion method based on fine-tuning and use a
-> Threshold Top-K method for post-processing. Ablation experiments demonstrate
-> the improvements in model performance brought by our proposed solution
-> pipeline.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multi-modal (image, satellite, time series, environmental data) plant species prediction using graph-based and vision backbone-extracted features, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions, failing to meet the primary 'MUST' criteria.
-
----
-
-## [Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence
-  Benchmarks](https://arxiv.org/abs/http://arxiv.org/abs/2501.02654v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02654v2
-
-**Abstract:**
-> Recent advancements in natural language processing have highlighted the
-> vulnerability of deep learning models to adversarial attacks. While various
-> defence mechanisms have been proposed, there is a lack of comprehensive
-> benchmarks that evaluate these defences across diverse datasets, models, and
-> tasks. In this work, we address this gap by presenting an extensive benchmark
-> for textual adversarial defence that significantly expands upon previous work.
-> Our benchmark incorporates a wide range of datasets, evaluates state-of-the-art
-> defence mechanisms, and extends the assessment to include critical tasks such
-> as single-sentence classification, similarity and paraphrase identification,
-> natural language inference, and commonsense reasoning. This work not only
-> serves as a valuable resource for researchers and practitioners in the field of
-> adversarial robustness but also identifies key areas for future research in
-> textual adversarial defence. By establishing a new standard for benchmarking in
-> this domain, we aim to accelerate progress towards more robust and reliable
-> natural language processing systems.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a benchmark for evaluating adversarial defence mechanisms in NLP, rather than primarily on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not provide concrete examples of prompts with their impact on LLM output.
-
----
-
-## [Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for
-  Personalized Micro-Video Recommendation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02666v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02666v1
-
-**Abstract:**
-> Micro-video recommendation is attracting global attention and becoming a
-> popular daily service for people of all ages. Recently, Graph Neural
-> Networks-based micro-video recommendation has displayed performance improvement
-> for many kinds of recommendation tasks. However, the existing works fail to
-> fully consider the characteristics of micro-videos, such as the high timeliness
-> of news nature micro-video recommendation and sequential interactions of
-> frequently changed interests. In this paper, a novel Multi-aggregator
-> Time-warping Heterogeneous Graph Neural Network (MTHGNN) is proposed for
-> personalized news nature micro-video recommendation based on sequential
-> sessions, where characteristics of micro-videos are comprehensively studied,
-> users' preference is mined via multi-aggregator, the temporal and dynamic
-> changes of users' preference are captured, and timeliness is considered.
-> Through the comparison with the state-of-the-arts, the experimental results
-> validate the superiority of our MTHGNN model.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing a novel Graph Neural Network for personalized micro-video recommendation, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [From Superficial Patterns to Semantic Understanding: Fine-Tuning
-  Language Models on Contrast Sets](https://arxiv.org/abs/http://arxiv.org/abs/2501.02683v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02683v2
-
-**Abstract:**
-> Large-scale pre-trained language models have demonstrated high performance on
-> standard datasets for natural language inference (NLI) tasks. Unfortunately,
-> these evaluations can be misleading, as although the models can perform well on
-> in-distribution data, they perform poorly on out-of-distribution test sets,
-> such as contrast sets. Contrast sets consist of perturbed instances of data
-> that have very minor, but meaningful, changes to the input that alter the gold
-> label, revealing how models can learn superficial patterns in the training data
-> rather than learning more sophisticated language nuances. As an example, the
-> ELECTRA-small language model achieves nearly 90% accuracy on an SNLI dataset
-> but drops to 75% when tested on an out-of-distribution contrast set. The
-> research carried out in this study explores how the robustness of a language
-> model can be improved by exposing it to small amounts of more complex contrast
-> sets during training to help it better learn language patterns. With this
-> approach, the model recovers performance and achieves nearly 90% accuracy on
-> contrast sets, highlighting the importance of diverse and challenging training
-> data.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on fine-tuning language models with contrast sets for improved robustness, which aligns with developing new training methods, and does not investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
-## [Horizon Generalization in Reinforcement Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02709v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02709v1
-
-**Abstract:**
-> We study goal-conditioned RL through the lens of generalization, but not in
-> the traditional sense of random augmentations and domain randomization. Rather,
-> we aim to learn goal-directed policies that generalize with respect to the
-> horizon: after training to reach nearby goals (which are easy to learn), these
-> policies should succeed in reaching distant goals (which are quite challenging
-> to learn). In the same way that invariance is closely linked with
-> generalization is other areas of machine learning (e.g., normalization layers
-> make a network invariant to scale, and therefore generalize to inputs of
-> varying scales), we show that this notion of horizon generalization is closely
-> linked with invariance to planning: a policy navigating towards a goal will
-> select the same actions as if it were navigating to a waypoint en route to that
-> goal. Thus, such a policy trained to reach nearby goals should succeed at
-> reaching arbitrarily-distant goals. Our theoretical analysis proves that both
-> horizon generalization and planning invariance are possible, under some
-> assumptions. We present new experimental results and recall findings from prior
-> work in support of our theoretical results. Taken together, our results open
-> the door to studying how techniques for invariance and generalization developed
-> in other areas of machine learning might be adapted to achieve this alluring
-> property.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on reinforcement learning, goal-conditioned policies, and horizon generalization, with no apparent connection to prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Improved Data Encoding for Emerging Computing Paradigms: From Stochastic
-  to Hyperdimensional Computing](https://arxiv.org/abs/http://arxiv.org/abs/2501.02715v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02715v1
-
-**Abstract:**
-> Data encoding is a fundamental step in emerging computing paradigms,
-> particularly in stochastic computing (SC) and hyperdimensional computing (HDC),
-> where it plays a crucial role in determining the overall system performance and
-> hardware cost efficiency. This study presents an advanced encoding strategy
-> that leverages a hardware-friendly class of low-discrepancy (LD) sequences,
-> specifically powers-of-2 bases of Van der Corput (VDC) sequences (VDC-2^n), as
-> sources for random number generation. Our approach significantly enhances the
-> accuracy and efficiency of SC and HDC systems by addressing challenges
-> associated with randomness. By employing LD sequences, we improve correlation
-> properties and reduce hardware complexity. Experimental results demonstrate
-> significant improvements in accuracy and energy savings for SC and HDC systems.
-> Our solution provides a robust framework for integrating SC and HDC in
-> resource-constrained environments, paving the way for efficient and scalable AI
-> implementations.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance, instead concentrating on data encoding strategies for stochastic and hyperdimensional computing paradigms.
-
----
-
-## [Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test
-  Data](https://arxiv.org/abs/http://arxiv.org/abs/2501.02727v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02727v1
-
-**Abstract:**
-> We present HiRMed (Hierarchical RAG-enhanced Medical Test Recommendation), a
-> novel tree-structured recommendation system that leverages Retrieval-Augmented
-> Generation (RAG) for intelligent medical test recommendations. Unlike
-> traditional vector similarity-based approaches, our system performs medical
-> reasoning at each tree node through a specialized RAG process. Starting from
-> the root node with initial symptoms, the system conducts step-wise medical
-> analysis to identify potential underlying conditions and their corresponding
-> diagnostic requirements. At each level, instead of simple matching, our
-> RAG-enhanced nodes analyze retrieved medical knowledge to understand
-> symptom-disease relationships and determine the most appropriate diagnostic
-> path. The system dynamically adjusts its recommendation strategy based on
-> medical reasoning results, considering factors such as urgency levels and
-> diagnostic uncertainty. Experimental results demonstrate that our approach
-> achieves superior performance in terms of coverage rate, accuracy, and miss
-> rate compared to conventional retrieval-based methods. This work represents a
-> significant advance in medical test recommendation by introducing medical
-> reasoning capabilities into the traditional tree-based retrieval structure.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on a medical application (medical test recommendation system) and develops a new system architecture (Tree-based RAG-Agent) rather than focusing on prompt engineering for Large Language Models (LLMs) in text-based interactions, failing to meet the required criteria.
-
----
-
-## [AFed: Algorithmic Fair Federated Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02732v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02732v1
-
-**Abstract:**
-> Federated Learning (FL) has gained significant attention as it facilitates
-> collaborative machine learning among multiple clients without centralizing
-> their data on a server. FL ensures the privacy of participating clients by
-> locally storing their data, which creates new challenges in fairness.
-> Traditional debiasing methods assume centralized access to sensitive
-> information, rendering them impractical for the FL setting. Additionally, FL is
-> more susceptible to fairness issues than centralized machine learning due to
-> the diverse client data sources that may be associated with group information.
-> Therefore, training a fair model in FL without access to client local data is
-> important and challenging. This paper presents AFed, a straightforward yet
-> effective framework for promoting group fairness in FL. The core idea is to
-> circumvent restricted data access by learning the global data distribution.
-> This paper proposes two approaches: AFed-G, which uses a conditional generator
-> trained on the server side, and AFed-GAN, which improves upon AFed-G by
-> training a conditional GAN on the client side. We augment the client data with
-> the generated samples to help remove bias. Our theoretical analysis justifies
-> the proposed methods, and empirical results on multiple real-world datasets
-> demonstrate a substantial improvement in AFed over several baselines.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a framework for promoting group fairness in Federated Learning (FL), with no emphasis on Large Language Models (LLMs), prompt engineering, or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [TARDiS : Text Augmentation for Refining Diversity and Separability](https://arxiv.org/abs/http://arxiv.org/abs/2501.02739v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02739v1
-
-**Abstract:**
-> Text augmentation (TA) is a critical technique for text classification,
-> especially in few-shot settings. This paper introduces a novel LLM-based TA
-> method, TARDiS, to address challenges inherent in the generation and alignment
-> stages of two-stage TA methods. For the generation stage, we propose two
-> generation processes, SEG and CEG, incorporating multiple class-specific
-> prompts to enhance diversity and separability. For the alignment stage, we
-> introduce a class adaptation (CA) method to ensure that generated examples
-> align with their target classes through verification and modification.
-> Experimental results demonstrate TARDiS's effectiveness, outperforming
-> state-of-the-art LLM-based TA methods in various few-shot text classification
-> tasks. An in-depth analysis confirms the detailed behaviors at each stage.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on a novel text augmentation method (TARDiS) for improving few-shot text classification tasks, using LLMs as a tool, rather than focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) to improve LLM performance through prompt manipulation.
-
----
-
-## [Interpretable Recognition of Fused Magnesium Furnace Working Conditions
-  with Deep Convolutional Stochastic Configuration Networks](https://arxiv.org/abs/http://arxiv.org/abs/2501.02740v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02740v1
-
-**Abstract:**
-> To address the issues of a weak generalization capability and
-> interpretability in working condition recognition model of a fused magnesium
-> furnace, this paper proposes an interpretable working condition recognition
-> method based on deep convolutional stochastic configuration networks (DCSCNs).
-> Firstly, a supervised learning mechanism is employed to generate physically
-> meaningful Gaussian differential convolution kernels. An incremental method is
-> utilized to construct a DCSCNs model, ensuring the convergence of recognition
-> errors in a hierarchical manner and avoiding the iterative optimization process
-> of convolutional kernel parameters using the widely used backpropagation
-> algorithm. The independent coefficient of channel feature maps is defined to
-> obtain the visualization results of feature class activation maps for the fused
-> magnesium furnace. A joint reward function is constructed based on the
-> recognition accuracy, the interpretable trustworthiness evaluation metrics, and
-> the model parameter quantity. Reinforcement learning (RL) is applied to
-> adaptively prune the convolutional kernels of the DCSCNs model, aiming to build
-> a compact, highly performed and interpretable network. The experimental results
-> demonstrate that the proposed method outperforms the other deep learning
-> approaches in terms of recognition accuracy and interpretability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary focus criteria, as it is centered on developing a deep convolutional stochastic configuration network for recognizing working conditions in a fused magnesium furnace, with no mention of Large Language Models (LLMs), prompt engineering, or text-based interactions.
-
----
-
-## [Visual Large Language Models for Generalized and Specialized
-  Applications](https://arxiv.org/abs/http://arxiv.org/abs/2501.02765v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02765v1
-
-**Abstract:**
-> Visual-language models (VLM) have emerged as a powerful tool for learning a
-> unified embedding space for vision and language. Inspired by large language
-> models, which have demonstrated strong reasoning and multi-task capabilities,
-> visual large language models (VLLMs) are gaining increasing attention for
-> building general-purpose VLMs. Despite the significant progress made in VLLMs,
-> the related literature remains limited, particularly from a comprehensive
-> application perspective, encompassing generalized and specialized applications
-> across vision (image, video, depth), action, and language modalities. In this
-> survey, we focus on the diverse applications of VLLMs, examining their using
-> scenarios, identifying ethics consideration and challenges, and discussing
-> future directions for their development. By synthesizing these contents, we aim
-> to provide a comprehensive guide that will pave the way for future innovations
-> and broader applications of VLLMs. The paper list repository is available:
-> https://github.com/JackYFL/awesome-VLLMs.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on Visual Large Language Models (VLLMs) for generalized and specialized applications across vision, action, and language modalities, with no primary focus on engineering, design, or optimization of textual input prompts specifically for Large Language Models (LLMs) as required.
-
----
-
-## [Are GNNs Effective for Multimodal Fault Diagnosis in Microservice
-  Systems?](https://arxiv.org/abs/http://arxiv.org/abs/2501.02766v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02766v1
-
-**Abstract:**
-> Fault diagnosis in microservice systems has increasingly embraced multimodal
-> observation data for a holistic and multifaceted view of the system, with Graph
-> Neural Networks (GNNs) commonly employed to model complex service dependencies.
-> However, despite the intuitive appeal, there remains a lack of compelling
-> justification for the adoption of GNNs, as no direct evidence supports their
-> necessity or effectiveness. To critically evaluate the current use of GNNs, we
-> propose DiagMLP, a simple topology-agnostic baseline as a substitute for GNNs
-> in fault diagnosis frameworks. Through experiments on five public datasets, we
-> surprisingly find that DiagMLP performs competitively with and even outperforms
-> GNN-based methods in fault diagnosis tasks, indicating that the current
-> paradigm of using GNNs to model service dependencies has not yet demonstrated a
-> tangible contribution. We further discuss potential reasons for this
-> observation and advocate shifting the focus from solely pursuing novel model
-> designs to developing challenging datasets, standardizing preprocessing
-> protocols, and critically evaluating the utility of advanced deep learning
-> modules.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on evaluating Graph Neural Networks (GNNs) for multimodal fault diagnosis in microservice systems, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [Enhancing Trustworthiness of Graph Neural Networks with Rank-Based
-  Conformal Training](https://arxiv.org/abs/http://arxiv.org/abs/2501.02767v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02767v1
-
-**Abstract:**
-> Graph Neural Networks (GNNs) has been widely used in a variety of fields
-> because of their great potential in representing graph-structured data.
-> However, lacking of rigorous uncertainty estimations limits their application
-> in high-stakes. Conformal Prediction (CP) can produce statistically guaranteed
-> uncertainty estimates by using the classifier's probability estimates to obtain
-> prediction sets, which contains the true class with a user-specified
-> probability. In this paper, we propose a Rank-based CP during training
-> framework to GNNs (RCP-GNN) for reliable uncertainty estimates to enhance the
-> trustworthiness of GNNs in the node classification scenario. By exploiting rank
-> information of the classifier's outcome, prediction sets with desired coverage
-> rate can be efficiently constructed. The strategy of CP during training with
-> differentiable rank-based conformity loss function is further explored to adapt
-> prediction sets according to network topology information. In this way, the
-> composition of prediction sets can be guided by the goal of jointly reducing
-> inefficiency and probability estimation errors. Extensive experiments on
-> several real-world datasets show that our model achieves any pre-defined target
-> marginal coverage while significantly reducing the inefficiency compared with
-> state-of-the-art methods.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on enhancing Graph Neural Networks (GNNs) with Conformal Prediction for reliable uncertainty estimates, which does not meet the MUST criteria of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs)."
-}
-
----
-
-## [ICFNet: Integrated Cross-modal Fusion Network for Survival Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.02778v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02778v1
-
-**Abstract:**
-> Survival prediction is a crucial task in the medical field and is essential
-> for optimizing treatment options and resource allocation. However, current
-> methods often rely on limited data modalities, resulting in suboptimal
-> performance. In this paper, we propose an Integrated Cross-modal Fusion Network
-> (ICFNet) that integrates histopathology whole slide images, genomic expression
-> profiles, patient demographics, and treatment protocols. Specifically, three
-> types of encoders, a residual orthogonal decomposition module and a unification
-> fusion module are employed to merge multi-modal features to enhance prediction
-> accuracy. Additionally, a balanced negative log-likelihood loss function is
-> designed to ensure fair training across different patients. Extensive
-> experiments demonstrate that our ICFNet outperforms state-of-the-art algorithms
-> on five public TCGA datasets, including BLCA, BRCA, GBMLGG, LUAD, and UCEC, and
-> shows its potential to support clinical decision-making and advance precision
-> medicine. The codes are available at: https://github.com/binging512/ICFNet.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a new network (ICFNet) for survival prediction in the medical field, integrating various data modalities, and does not meet the criteria of focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Hybrid deep convolution model for lung cancer detection with transfer
-  learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02785v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02785v1
-
-**Abstract:**
-> Advances in healthcare research have significantly enhanced our understanding
-> of disease mechanisms, diagnostic precision, and therapeutic options. Yet, lung
-> cancer remains one of the leading causes of cancer-related mortality worldwide
-> due to challenges in early and accurate diagnosis. While current lung cancer
-> detection models show promise, there is considerable potential for further
-> improving the accuracy for timely intervention. To address this challenge, we
-> introduce a hybrid deep convolution model leveraging transfer learning, named
-> the Maximum Sensitivity Neural Network (MSNN). MSNN is designed to improve the
-> precision of lung cancer detection by refining sensitivity and specificity.
-> This model has surpassed existing deep learning approaches through experimental
-> validation, achieving an accuracy of 98% and a sensitivity of 97%. By
-> overlaying sensitivity maps onto lung Computed Tomography (CT) scans, it
-> enables the visualization of regions most indicative of malignant or benign
-> classifications. This innovative method demonstrates exceptional performance in
-> distinguishing lung cancer with minimal false positives, thereby enhancing the
-> accuracy of medical diagnoses.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a new deep convolution model for lung cancer detection, which falls under medical subjects and does not address prompt engineering for Large Language Models (LLMs) as required.
-
----
-
-## [Segmenting Text and Learning Their Rewards for Improved RLHF in Language
-  Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.02790v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02790v1
-
-**Abstract:**
-> Reinforcement learning from human feedback (RLHF) has been widely adopted to
-> align language models (LMs) with human preference. Prior RLHF works typically
-> take a bandit formulation, which, though intuitive, ignores the sequential
-> nature of LM generation and can suffer from the sparse reward issue. While
-> recent works propose dense token-level RLHF, treating each token as an action
-> may be oversubtle to proper reward assignment. In this paper, we seek to get
-> the best of both by training and utilizing a segment-level reward model, which
-> assigns a reward to each semantically complete text segment that spans over a
-> short sequence of tokens. For reward learning, our method allows dynamic text
-> segmentation and compatibility with standard sequence-preference datasets. For
-> effective RL-based LM training against segment reward, we generalize the
-> classical scalar bandit reward normalizers into location-aware normalizer
-> functions and interpolate the segment reward for further densification. With
-> these designs, our method performs competitively on three popular RLHF
-> benchmarks for LM policy: AlpacaEval 2.0, Arena-Hard, and MT-Bench. Ablation
-> studies are conducted to further demonstrate our method.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on improving Reinforcement Learning from Human Feedback (RLHF) for Language Models through segment-level reward modeling, rather than specifically investigating, analyzing, or proposing methods for improving LLM performance through the manipulation of textual input prompts."
-}
-
----
-
-## [Enhancing Lifelong Multi-Agent Path Finding with Cache Mechanism](https://arxiv.org/abs/http://arxiv.org/abs/2501.02803v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02803v1
-
-**Abstract:**
-> Multi-Agent Path Finding (MAPF), which focuses on finding collision-free
-> paths for multiple robots, is crucial in autonomous warehouse operations.
-> Lifelong MAPF (L-MAPF), where agents are continuously reassigned new targets
-> upon completing their current tasks, offers a more realistic approximation of
-> real-world warehouse scenarios. While cache storage systems can enhance
-> efficiency and reduce operational costs, existing approaches primarily rely on
-> expectations and mathematical models, often without adequately addressing the
-> challenges of multi-robot planning and execution. In this paper, we introduce a
-> novel mechanism called Lifelong MAPF with Cache Mechanism (L-MAPF-CM), which
-> integrates high-level cache storage with low-level path planning. We have
-> involved a new type of map grid called cache for temporary item storage.
-> Additionally, we involved a task assigner (TA) with a locking mechanism to
-> bridge the gap between the new cache grid and L-MAPF algorithm. The TA
-> dynamically allocates target locations to agents based on their status in
-> various scenarios. We evaluated L-MAPF-CM using different cache replacement
-> policies and task distributions. L-MAPF-CM has demonstrated performance
-> improvements particularly with high cache hit rates and smooth traffic
-> conditions.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on enhancing Multi-Agent Path Finding with a cache mechanism for autonomous warehouse operations, which does not meet the required focus on prompt engineering, design, or optimization specifically for Large Language Models (LLMs) and their textual input prompts.
-
----
-
-## [InpDiffusion: Image Inpainting Localization via Conditional Diffusion
-  Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02816v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02816v1
-
-**Abstract:**
-> As artificial intelligence advances rapidly, particularly with the advent of
-> GANs and diffusion models, the accuracy of Image Inpainting Localization (IIL)
-> has become increasingly challenging. Current IIL methods face two main
-> challenges: a tendency towards overconfidence, leading to incorrect
-> predictions; and difficulty in detecting subtle tampering boundaries in
-> inpainted images. In response, we propose a new paradigm that treats IIL as a
-> conditional mask generation task utilizing diffusion models. Our method,
-> InpDiffusion, utilizes the denoising process enhanced by the integration of
-> image semantic conditions to progressively refine predictions. During
-> denoising, we employ edge conditions and introduce a novel edge supervision
-> strategy to enhance the model's perception of edge details in inpainted
-> objects. Balancing the diffusion model's stochastic sampling with edge
-> supervision of tampered image regions mitigates the risk of incorrect
-> predictions from overconfidence and prevents the loss of subtle boundaries that
-> can result from overly stochastic processes. Furthermore, we propose an
-> innovative Dual-stream Multi-scale Feature Extractor (DMFE) for extracting
-> multi-scale features, enhancing feature representation by considering both
-> semantic and edge conditions of the inpainted images. Extensive experiments
-> across challenging datasets demonstrate that the InpDiffusion significantly
-> outperforms existing state-of-the-art methods in IIL tasks, while also
-> showcasing excellent generalization capabilities and robustness.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary criteria as it focuses on image inpainting localization using conditional diffusion models, with no apparent connection to Large Language Models (LLMs), prompt engineering, or textual input prompts. The subject matter falls under image generation and processing, which is explicitly excluded.
-
----
-
-## [Enhanced Rooftop Solar Panel Detection by Efficiently Aggregating Local
-  Features](https://arxiv.org/abs/http://arxiv.org/abs/2501.02840v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02840v1
-
-**Abstract:**
-> In this paper, we present an enhanced Convolutional Neural Network
-> (CNN)-based rooftop solar photovoltaic (PV) panel detection approach using
-> satellite images. We propose to use pre-trained CNN-based model to extract the
-> local convolutional features of rooftops. These local features are then
-> combined using the Vectors of Locally Aggregated Descriptors (VLAD) technique
-> to obtain rooftop-level global features, which are then used to train
-> traditional Machine Learning (ML) models to identify rooftop images that do and
-> do not contain PV panels. On the dataset used in this study, the proposed
-> approach achieved rooftop-PV classification scores exceeding the predefined
-> threshold of 0.9 across all three cities for each of the feature extractor
-> networks evaluated. Moreover, we propose a 3-phase approach to enable efficient
-> utilization of the previously trained models on a new city or region with
-> limited labelled data. We illustrate the effectiveness of this 3-phase approach
-> for multi-city rooftop-PV detection task.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on enhancing rooftop solar panel detection using CNN and ML techniques with satellite images, and does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through the manipulation of textual input prompts as required.
-
----
-
-## [IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks
-  by Efficient Human Preference Alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.02869v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02869v1
-
-**Abstract:**
-> Recent researches of large language models(LLM), which is pre-trained on
-> massive general-purpose corpora, have achieved breakthroughs in responding
-> human queries. However, these methods face challenges including limited data
-> insufficiency to support extensive pre-training and can not align responses
-> with users' instructions. To address these issues, we introduce a medical
-> instruction dataset, CMedINS, containing six medical instructions derived from
-> actual medical tasks, which effectively fine-tunes LLM in conjunction with
-> other data. Subsequently, We launch our medical model, IIMedGPT, employing an
-> efficient preference alignment method, Direct preference Optimization(DPO). The
-> results show that our final model outperforms existing medical models in
-> medical dialogue.Datsets, Code and model checkpoints will be released upon
-> acceptance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on fine-tuning a Large Language Model for medical tasks using a new dataset and preference alignment method, which violates the 'MUST NOT' criteria of not focusing on the development of new LLM architectures or training methods, and being primarily concerned with medical subjects.
-
----
-
-## [Skillful High-Resolution Ensemble Precipitation Forecasting with an
-  Integrated Deep Learning Framework](https://arxiv.org/abs/http://arxiv.org/abs/2501.02905v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02905v1
-
-**Abstract:**
-> High-resolution precipitation forecasts are crucial for providing accurate
-> weather prediction and supporting effective responses to extreme weather
-> events. Traditional numerical models struggle with stochastic subgrid-scale
-> processes, while recent deep learning models often produce blurry results. To
-> address these challenges, we propose a physics-inspired deep learning framework
-> for high-resolution (0.05\textdegree{} $\times$ 0.05\textdegree{}) ensemble
-> precipitation forecasting. Trained on ERA5 and CMPA high-resolution
-> precipitation datasets, the framework integrates deterministic and
-> probabilistic components. The deterministic model, based on a 3D
-> SwinTransformer, captures average precipitation at mesoscale resolution and
-> incorporates strategies to enhance performance, particularly for moderate to
-> heavy rainfall. The probabilistic model employs conditional diffusion in latent
-> space to account for uncertainties in residual precipitation at convective
-> scales. During inference, ensemble members are generated by repeatedly sampling
-> latent variables, enabling the model to represent precipitation uncertainty.
-> Our model significantly enhances spatial resolution and forecast accuracy. Rank
-> histogram shows that the ensemble system is reliable and unbiased. In a case
-> study of heavy precipitation in southern China, the model outputs align more
-> closely with observed precipitation distributions than ERA5, demonstrating
-> superior capability in capturing extreme precipitation events. Additionally,
-> 5-day real-time forecasts show good performance in terms of CSI scores.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a deep learning framework for high-resolution precipitation forecasting, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet the primary criteria.
-
----
-
-## [Unsupervised Tomato Split Anomaly Detection using Hyperspectral Imaging
-  and Variational Autoencoders](https://arxiv.org/abs/http://arxiv.org/abs/2501.02921v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02921v1
-
-**Abstract:**
-> Tomato anomalies/damages pose a significant challenge in greenhouse farming.
-> While this method of cultivation benefits from efficient resource utilization,
-> anomalies can significantly degrade the quality of farm produce. A common
-> anomaly associated with tomatoes is splitting, characterized by the development
-> of cracks on the tomato skin, which degrades its quality. Detecting this type
-> of anomaly is challenging due to dynamic variations in appearance and sizes,
-> compounded by dataset scarcity. We address this problem in an unsupervised
-> manner by utilizing a tailored variational autoencoder (VAE) with hyperspectral
-> input. Preliminary analysis of the dataset enabled us to select the optimal
-> range of wavelengths for detecting this anomaly. Our findings indicate that the
-> 530nm - 550nm range is suitable for identifying tomato dry splits. The analysis
-> on reconstruction loss allow us to not only detect the anomalies but also to
-> some degree estimate the anomalous regions.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet any of the 'MUST' criteria: it focuses on anomaly detection in hyperspectral imaging using Variational Autoencoders, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus falling entirely outside the specified scope.
-
----
-
-## [GLFC: Unified Global-Local Feature and Contrast Learning with
-  Mamba-Enhanced UNet for Synthetic CT Generation from CBCT](https://arxiv.org/abs/http://arxiv.org/abs/2501.02992v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02992v2
-
-**Abstract:**
-> Generating synthetic Computed Tomography (CT) images from Cone Beam Computed
-> Tomography (CBCT) is desirable for improving the image quality of CBCT.
-> Existing synthetic CT (sCT) generation methods using Convolutional Neural
-> Networks (CNN) and Transformers often face difficulties in effectively
-> capturing both global and local features and contrasts for high-quality sCT
-> generation. In this work, we propose a Global-Local Feature and Contrast
-> learning (GLFC) framework for sCT generation. First, a Mamba-Enhanced UNet
-> (MEUNet) is introduced by integrating Mamba blocks into the skip connections of
-> a high-resolution UNet for effective global and local feature learning. Second,
-> we propose a Multiple Contrast Loss (MCL) that calculates synthetic loss at
-> different intensity windows to improve quality for both soft tissues and bone
-> regions. Experiments on the SynthRAD2023 dataset demonstrate that GLFC improved
-> the SSIM of sCT from 77.91% to 91.50% compared with the original CBCT, and
-> significantly outperformed several existing methods for sCT generation. The
-> code is available at https://github.com/HiLab-git/GLFC
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on generating synthetic CT images from CBCT using a novel CNN framework (GLFC with Mamba-Enhanced UNet), which does not meet the 'MUST' criteria of focusing on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts for improving LLM performance."
-}
-
----
-
-## [Quality Estimation based Feedback Training for Improving Pronoun
-  Translation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03008v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03008v1
-
-**Abstract:**
-> Pronoun translation is a longstanding challenge in neural machine translation
-> (NMT), often requiring inter-sentential context to ensure linguistic accuracy.
-> To address this, we introduce ProNMT, a novel framework designed to enhance
-> pronoun and overall translation quality in context-aware machine translation
-> systems. ProNMT leverages Quality Estimation (QE) models and a unique Pronoun
-> Generation Likelihood-Based Feedback mechanism to iteratively fine-tune
-> pre-trained NMT models without relying on extensive human annotations. The
-> framework combines QE scores with pronoun-specific rewards to guide training,
-> ensuring improved handling of linguistic nuances. Extensive experiments
-> demonstrate significant gains in pronoun translation accuracy and general
-> translation quality across multiple metrics. ProNMT offers an efficient,
-> scalable, and context-aware approach to improving NMT systems, particularly in
-> translating context-dependent elements like pronouns.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on improving neural machine translation (NMT) systems through Quality Estimation based feedback training, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs), and does not provide concrete examples of prompts or demonstrate their impact on LLM output.
-
----
-
-## [Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization
-  Degradation for Mathematical Reasoning](https://arxiv.org/abs/http://arxiv.org/abs/2501.03035v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03035v1
-
-**Abstract:**
-> Large language models have achieved significant advancements in complex
-> mathematical reasoning benchmarks, such as MATH. However, their substantial
-> computational requirements present challenges for practical deployment. Model
-> quantization has emerged as an effective strategy to reduce memory usage and
-> computational costs by employing lower precision and bit-width representations.
-> In this study, we systematically evaluate the impact of quantization on
-> mathematical reasoning tasks. We introduce a multidimensional evaluation
-> framework that qualitatively assesses specific capability dimensions and
-> conduct quantitative analyses on the step-by-step outputs of various
-> quantization methods. Our results demonstrate that quantization differentially
-> affects numerical computation and reasoning planning abilities, identifying key
-> areas where quantized models experience performance degradation.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on model quantization for reducing computational costs and its impact on mathematical reasoning tasks, rather than prompt engineering for Large Language Models (LLMs), not meeting the 'MUST' criteria for prompt engineering, LLM performance improvement through prompt manipulation, and providing concrete prompt examples."
-}
-
----
-
-## [Piano Transcription by Hierarchical Language Modeling with Pretrained
-  Roll-based Encoders](https://arxiv.org/abs/http://arxiv.org/abs/2501.03038v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03038v2
-
-**Abstract:**
-> Automatic Music Transcription (AMT), aiming to get musical notes from raw
-> audio, typically uses frame-level systems with piano-roll outputs or language
-> model (LM)-based systems with note-level predictions. However, frame-level
-> systems require manual thresholding, while the LM-based systems struggle with
-> long sequences. In this paper, we propose a hybrid method combining pre-trained
-> roll-based encoders with an LM decoder to leverage the strengths of both
-> methods. Besides, our approach employs a hierarchical prediction strategy,
-> first predicting onset and pitch, then velocity, and finally offset. The
-> hierarchical prediction strategy reduces computational costs by breaking down
-> long sequences into different hierarchies. Evaluated on two benchmark
-> roll-based encoders, our method outperforms traditional piano-roll outputs 0.01
-> and 0.022 in onset-offset-velocity F1 score, demonstrating its potential as a
-> performance-enhancing plug-in for arbitrary roll-based music transcription
-> encoder.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on Automatic Music Transcription using a hybrid method combining pre-trained roll-based encoders with a language model decoder, rather than prompt engineering for Large Language Models (LLMs). The subject is outside the specified scope, particularly excluding applications of generative AI other than text generation driven by LLMs, such as audio generation in this case.
-
----
-
-## [Single-Channel Distance-Based Source Separation for Mobile GPU in
-  Outdoor and Indoor Environments](https://arxiv.org/abs/http://arxiv.org/abs/2501.03045v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03045v1
-
-**Abstract:**
-> This study emphasizes the significance of exploring distance-based source
-> separation (DSS) in outdoor environments. Unlike existing studies that
-> primarily focus on indoor settings, the proposed model is designed to capture
-> the unique characteristics of outdoor audio sources. It incorporates advanced
-> techniques, including a two-stage conformer block, a linear relation-aware
-> self-attention (RSA), and a TensorFlow Lite GPU delegate. While the linear RSA
-> may not capture physical cues as explicitly as the quadratic RSA, the linear
-> RSA enhances the model's context awareness, leading to improved performance on
-> the DSS that requires an understanding of physical cues in outdoor and indoor
-> environments. The experimental results demonstrated that the proposed model
-> overcomes the limitations of existing approaches and considerably enhances
-> energy efficiency and real-time inference speed on mobile devices.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on source separation for audio signals in outdoor and indoor environments, utilizing mobile GPU and TensorFlow, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Personalized Fashion Recommendation with Image Attributes and Aesthetics
-  Assessment](https://arxiv.org/abs/http://arxiv.org/abs/2501.03085v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03085v1
-
-**Abstract:**
-> Personalized fashion recommendation is a difficult task because 1) the
-> decisions are highly correlated with users' aesthetic appetite, which previous
-> work frequently overlooks, and 2) many new items are constantly rolling out
-> that cause strict cold-start problems in the popular identity (ID)-based
-> recommendation methods. These new items are critical to recommend because of
-> trend-driven consumerism. In this work, we aim to provide more accurate
-> personalized fashion recommendations and solve the cold-start problem by
-> converting available information, especially images, into two attribute graphs
-> focusing on optimized image utilization and noise-reducing user modeling.
-> Compared with previous methods that separate image and text as two components,
-> the proposed method combines image and text information to create a richer
-> attributes graph. Capitalizing on the advancement of large language and vision
-> models, we experiment with extracting fine-grained attributes efficiently and
-> as desired using two different prompts. Preliminary experiments on the IQON3000
-> dataset have shown that the proposed method achieves competitive accuracy
-> compared with baselines.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on personalized fashion recommendation using image attributes and aesthetics, combining image and text information, rather than specifically engineering or optimizing prompts for Large Language Models (LLMs). While it mentions using two different prompts for attribute extraction, prompt engineering is not the central concern, but rather a minor component within the larger system.
-
----
-
-## [LangFair: A Python Package for Assessing Bias and Fairness in Large
-  Language Model Use Cases](https://arxiv.org/abs/http://arxiv.org/abs/2501.03112v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03112v1
-
-**Abstract:**
-> Large Language Models (LLMs) have been observed to exhibit bias in numerous
-> ways, potentially creating or worsening outcomes for specific groups identified
-> by protected attributes such as sex, race, sexual orientation, or age. To help
-> address this gap, we introduce LangFair, an open-source Python package that
-> aims to equip LLM practitioners with the tools to evaluate bias and fairness
-> risks relevant to their specific use cases. The package offers functionality to
-> easily generate evaluation datasets, comprised of LLM responses to
-> use-case-specific prompts, and subsequently calculate applicable metrics for
-> the practitioner's use case. To guide in metric selection, LangFair offers an
-> actionable decision framework.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on assessing bias and fairness in Large Language Models using a Python package, rather than on the engineering, design, or optimization of prompts specifically for LLMs to improve their text generation performance.
-
----
-
-## [From Models to Network Topologies: A Topology Inference Attack in
-  Decentralized Federated Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.03119v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03119v1
-
-**Abstract:**
-> Federated Learning (FL) is widely recognized as a privacy-preserving machine
-> learning paradigm due to its model-sharing mechanism that avoids direct data
-> exchange. However, model training inevitably leaves exploitable traces that can
-> be used to infer sensitive information. In Decentralized FL (DFL), the overlay
-> topology significantly influences its models' convergence, robustness, and
-> security. This study explores the feasibility of inferring the overlay topology
-> of DFL systems based solely on model behavior, introducing a novel Topology
-> Inference Attack. A taxonomy of topology inference attacks is proposed,
-> categorizing them by the attacker's capabilities and knowledge. Practical
-> attack strategies are developed for different scenarios, and quantitative
-> experiments are conducted to identify key factors influencing the attack
-> effectiveness. Experimental results demonstrate that analyzing only the public
-> models of individual nodes can accurately infer the DFL topology, underscoring
-> the risk of sensitive information leakage in DFL systems. This finding offers
-> valuable insights for improving privacy preservation in decentralized learning
-> environments.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on topology inference attacks in Decentralized Federated Learning, which does not meet the MUST criteria: it does not focus on engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompts."
-}
-
----
-
-## [PRMBench: A Fine-grained and Challenging Benchmark for Process-Level
-  Reward Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03124v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03124v2
-
-**Abstract:**
-> Process-level Reward Models (PRMs) are crucial for complex reasoning and
-> decision-making tasks, where each intermediate step plays an important role in
-> the reasoning process. Since language models are prone to various types of
-> errors during the reasoning process, PRMs are required to possess nuanced
-> capabilities for detecting various implicit error types in real-world
-> scenarios. However, current benchmarks primarily focus on step correctness,
-> failing to evaluate PRMs' performance systematically. To address this gap, we
-> introduce PRMBench, a process-level benchmark specifically designed to assess
-> the fine-grained error detection capabilities of PRMs. PRMBench comprises 6,216
-> carefully designed problems and 83,456 step-level labels, evaluating models
-> across multiple dimensions, including simplicity, soundness, and sensitivity.
-> In our experiments on 15 models, spanning both open-source PRMs and
-> closed-source large language models prompted as critic models, we uncover
-> significant weaknesses in current PRMs. These findings underscore the
-> challenges inherent in process-level evaluation and highlight key directions
-> for future research. We hope PRMBench can be a robust bench for advancing
-> research on PRM evaluation and development.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a benchmark for evaluating Process-Level Reward Models (PRMs) and their error detection capabilities, rather than engineering, designing, or optimizing prompts specifically for Large Language Models (LLMs). While LLMs are mentioned as being used as critic models, prompt engineering is not the central concern.
-
----
-
-## [Geometry Restoration and Dewarping of Camera-Captured Document Images](https://arxiv.org/abs/http://arxiv.org/abs/2501.03145v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03145v2
-
-**Abstract:**
-> This research focuses on developing a method for restoring the topology of
-> digital images of paper documents captured by a camera, using algorithms for
-> detection, segmentation, geometry restoration, and dewarping. Our methodology
-> employs deep learning (DL) for document outline detection, followed by computer
-> vision (CV) to create a topological 2D grid using cubic polynomial
-> interpolation and correct nonlinear distortions by remapping the image. Using
-> classical CV methods makes the document topology restoration process more
-> efficient and faster, as it requires significantly fewer computational
-> resources and memory. We developed a new pipeline for automatic document
-> dewarping and reconstruction, along with a framework and annotated dataset to
-> demonstrate its efficiency. Our experiments confirm the promise of our
-> methodology and its superiority over existing benchmarks (including mobile apps
-> and popular DL solutions, such as RectiNet, DocGeoNet, and DocTr++) both
-> visually and in terms of document readability via Optical Character Recognition
-> (OCR) and geometry restoration metrics. This paves the way for creating
-> high-quality digital copies of paper documents and enhancing the efficiency of
-> OCR systems. Project page: https://github.com/HorizonParadox/DRCCBI
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a method for restoring and dewarping document images using computer vision and deep learning, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Automated Generation of Challenging Multiple-Choice Questions for Vision
-  Language Model Evaluation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03225v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03225v1
-
-**Abstract:**
-> The rapid development of vision language models (VLMs) demands rigorous and
-> reliable evaluation. However, current visual question answering (VQA)
-> benchmarks often depend on open-ended questions, making accurate evaluation
-> difficult due to the variability in natural language responses. To address
-> this, we introduce AutoConverter, an agentic framework that automatically
-> converts these open-ended questions into multiple-choice format, enabling
-> objective evaluation while reducing the costly question creation process. Our
-> experiments demonstrate that AutoConverter can generate correct and challenging
-> multiple-choice questions, with VLMs demonstrating consistently similar or
-> lower accuracy on these questions compared to human-created ones. Using
-> AutoConverter, we construct VMCBench, a benchmark created by transforming 20
-> existing VQA datasets into a unified multiple-choice format, totaling 9,018
-> questions. We comprehensively evaluate 33 state-of-the-art VLMs on VMCBench,
-> setting a new standard for scalable, consistent, and reproducible VLM
-> evaluation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on evaluating Vision Language Models (VLMs) through automated generation of multiple-choice questions, rather than specifically engineering or optimizing prompts for Large Language Models (LLMs) to improve text generation, as required by the criteria.
-
----
-
-## [LightGNN: Simple Graph Neural Network for Recommendation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03228v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03228v2
-
-**Abstract:**
-> Graph neural networks (GNNs) have demonstrated superior performance in
-> collaborative recommendation through their ability to conduct high-order
-> representation smoothing, effectively capturing structural information within
-> users' interaction patterns. However, existing GNN paradigms face significant
-> challenges in scalability and robustness when handling large-scale, noisy, and
-> real-world datasets. To address these challenges, we present LightGNN, a
-> lightweight and distillation-based GNN pruning framework designed to
-> substantially reduce model complexity while preserving essential collaboration
-> modeling capabilities. Our LightGNN framework introduces a computationally
-> efficient pruning module that adaptively identifies and removes redundant edges
-> and embedding entries for model compression. The framework is guided by a
-> resource-friendly hierarchical knowledge distillation objective, whose
-> intermediate layer augments the observed graph to maintain performance,
-> particularly in high-rate compression scenarios. Extensive experiments on
-> public datasets demonstrate LightGNN's effectiveness, significantly improving
-> both computational efficiency and recommendation accuracy. Notably, LightGNN
-> achieves an 80% reduction in edge count and 90% reduction in embedding entries
-> while maintaining performance comparable to more complex state-of-the-art
-> baselines. The implementation of our LightGNN framework is available at the
-> github repository: https://github.com/HKUDS/LightGNN.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a lightweight Graph Neural Network (GNN) for recommendation systems, with no mention of Large Language Models (LLMs), prompt engineering, or textual input manipulation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Gaussian Masked Autoencoders](https://arxiv.org/abs/http://arxiv.org/abs/2501.03229v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03229v1
-
-**Abstract:**
-> This paper explores Masked Autoencoders (MAE) with Gaussian Splatting. While
-> reconstructive self-supervised learning frameworks such as MAE learns good
-> semantic abstractions, it is not trained for explicit spatial awareness. Our
-> approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic
-> abstractions and spatial understanding jointly. Like MAE, it reconstructs the
-> image end-to-end in the pixel space, but beyond MAE, it also introduces an
-> intermediate, 3D Gaussian-based representation and renders images via
-> splatting. We show that GMAE can enable various zero-shot learning capabilities
-> of spatial understanding (e.g., figure-ground segmentation, image layering,
-> edge detection, etc.) while preserving the high-level semantics of
-> self-supervised representation quality from MAE. To our knowledge, we are the
-> first to employ Gaussian primitives in an image representation learning
-> framework beyond optimization-based single-scene reconstructions. We believe
-> GMAE will inspire further research in this direction and contribute to
-> developing next-generation techniques for modeling high-fidelity visual data.
-> More details at https://brjathu.github.io/gmae
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a new image representation learning framework (Gaussian Masked Autoencoders) for spatial understanding and visual data modeling, with no mention of Large Language Models (LLMs), prompt engineering, or text-based interactions, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Backdoor Token Unlearning: Exposing and Defending Backdoors in
-  Pretrained Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03272v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03272v1
-
-**Abstract:**
-> Supervised fine-tuning has become the predominant method for adapting large
-> pretrained models to downstream tasks. However, recent studies have revealed
-> that these models are vulnerable to backdoor attacks, where even a small number
-> of malicious samples can successfully embed backdoor triggers into the model.
-> While most existing defense methods focus on post-training backdoor defense,
-> efficiently defending against backdoor attacks during training phase remains
-> largely unexplored. To address this gap, we propose a novel defense method
-> called Backdoor Token Unlearning (BTU), which proactively detects and
-> neutralizes trigger tokens during the training stage. Our work is based on two
-> key findings: 1) backdoor learning causes distinctive differences between
-> backdoor token parameters and clean token parameters in word embedding layers,
-> and 2) the success of backdoor attacks heavily depends on backdoor token
-> parameters. The BTU defense leverages these properties to identify aberrant
-> embedding parameters and subsequently removes backdoor behaviors using a
-> fine-grained unlearning technique. Extensive evaluations across three datasets
-> and four types of backdoor attacks demonstrate that BTU effectively defends
-> against these threats while preserving the model's performance on primary
-> tasks. Our code is available at https://github.com/XDJPH/BTU.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on defending against backdoor attacks in pretrained language models during the training phase, rather than prompt engineering for text-based interactions with Large Language Models (LLMs), failing to meet the 'MUST' criteria related to prompt engineering and manipulation of textual input prompts.
-
----
-
-## [Strategic Fusion Optimizes Transformer Compression](https://arxiv.org/abs/http://arxiv.org/abs/2501.03273v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03273v1
-
-**Abstract:**
-> This study investigates transformer model compression by systematically
-> pruning its layers. We evaluated 14 pruning strategies across nine diverse
-> datasets, including 12 strategies based on different signals obtained from
-> layer activations, mutual information, gradients, weights, and attention. To
-> address the limitations of single-signal strategies, we introduced two fusion
-> strategies, linear regression and random forest, which combine individual
-> strategies (i.e., strategic fusion), for more informed pruning decisions.
-> Additionally, we applied knowledge distillation to mitigate any accuracy loss
-> during layer pruning. Our results reveal that random forest strategic fusion
-> outperforms individual strategies in seven out of nine datasets and achieves
-> near-optimal performance in the other two. The distilled random forest
-> surpasses the original accuracy in six datasets and mitigates accuracy drops in
-> the remaining three. Knowledge distillation also improves the accuracy-to-size
-> ratio by an average factor of 18.84 across all datasets. Supported by
-> mathematical foundations and biological analogies, our findings suggest that
-> strategically combining multiple signals can lead to efficient, high-performing
-> transformer models for resource-constrained applications.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on transformer model compression through layer pruning and knowledge distillation, not on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [ComMer: a Framework for Compressing and Merging User Data for
-  Personalization](https://arxiv.org/abs/http://arxiv.org/abs/2501.03276v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03276v1
-
-**Abstract:**
-> Large Language Models (LLMs) excel at a wide range of tasks, but adapting
-> them to new data, particularly for personalized applications, poses significant
-> challenges due to resource and computational constraints. Existing methods
-> either rely on exposing fresh data to the model through the prompt, which is
-> limited by context size and computationally expensive at inference time, or
-> fine-tuning, which incurs substantial training and update costs. In this paper,
-> we introduce ComMer - Compress and Merge - a novel framework that efficiently
-> personalizes LLMs by compressing users' documents into compact representations,
-> which are then merged and fed into a frozen LLM. We evaluate ComMer on two
-> types of personalization tasks - personalized skill learning, using the tweet
-> paraphrasing dataset and the personalized news headline generation dataset from
-> the LaMP benchmark, and knowledge-intensive, using the PerLTQA dataset. Our
-> experiments demonstrate that in constrained inference budget scenarios ComMer
-> achieves superior quality in skill learning tasks, while highlighting
-> limitations in knowledge-intensive settings due to the loss of detailed
-> information. These results offer insights into trade-offs and potential
-> optimizations in multi-document compression for personalization.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a framework for compressing and merging user data to personalize Large Language Models (LLMs), rather than on the engineering, design, or optimization of prompts specifically for LLMs. It does not investigate or propose methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
-## [CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps
-  and Vision Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03288v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03288v1
-
-**Abstract:**
-> The rise of large language models (LLMs) like ChatGPT has significantly
-> improved automated code generation, enhancing software development efficiency.
-> However, this introduces challenges in academia, particularly in distinguishing
-> between human-written and LLM-generated code, which complicates issues of
-> academic integrity. Existing detection methods, such as pre-trained models and
-> watermarking, face limitations in adaptability and computational efficiency. In
-> this paper, we propose a novel detection method using 2D token probability maps
-> combined with vision models, preserving spatial code structures such as
-> indentation and brackets. By transforming code into log probability matrices
-> and applying vision models like Vision Transformers (ViT) and ResNet, we
-> capture both content and structure for more accurate detection. Our method
-> shows robustness across multiple programming languages and improves upon
-> traditional detectors, offering a scalable and computationally efficient
-> solution for identifying LLM-generated code.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on detecting LLM-generated code using vision models, rather than on the engineering, design, or optimization of prompts for Large Language Models (LLMs), failing to meet the primary criteria of focusing on prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Analyzing Bias in Swiss Federal Supreme Court Judgments Using Facebook's
-  Holistic Bias Dataset: Implications for Language Model Training](https://arxiv.org/abs/http://arxiv.org/abs/2501.03324v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03324v1
-
-**Abstract:**
-> Natural Language Processing (NLP) is vital for computers to process and
-> respond accurately to human language. However, biases in training data can
-> introduce unfairness, especially in predicting legal judgment. This study
-> focuses on analyzing biases within the Swiss Judgment Prediction Dataset
-> (SJP-Dataset). Our aim is to ensure unbiased factual descriptions essential for
-> fair decision making by NLP models in legal contexts. We analyze the dataset
-> using social bias descriptors from the Holistic Bias dataset and employ
-> advanced NLP techniques, including attention visualization, to explore the
-> impact of dispreferred descriptors on model predictions. The study identifies
-> biases and examines their influence on model behavior. Challenges include
-> dataset imbalance and token limits affecting model performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on analyzing biases in a legal dataset for fair NLP model training, not on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not provide concrete examples of prompt manipulation impacting LLM output.
-
----
-
-## [Existential Crisis: A Social Robot's Reason for Being](https://arxiv.org/abs/http://arxiv.org/abs/2501.03376v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03376v1
-
-**Abstract:**
-> As Robots become ever more important in our daily lives there's growing need
-> for understanding how they're perceived by people. This study aims to
-> investigate how the user perception of robots is influenced by displays of
-> personality. Using LLMs and speech to text technology, we designed a
-> within-subject study to compare two conditions: a personality-driven robot and
-> a purely task-oriented, personality-neutral robot. Twelve participants,
-> recruited from Socially Intelligent Robotics course at Vrije Universiteit
-> Amsterdam, interacted with a robot Nao tasked with asking them a set of medical
-> questions under both conditions. After completing both interactions, the
-> participants completed a user experience questionnaire measuring their
-> emotional states and robot perception using standardized questionnaires from
-> the SRI and Psychology literature.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on user perception of robots with displayed personality, utilizing LLMs as a tool, but prompt engineering for Large Language Models is not the primary concern; the core subject revolves around social robotics and user experience, not the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Over-the-Air Fair Federated Learning via Multi-Objective Optimization](https://arxiv.org/abs/http://arxiv.org/abs/2501.03392v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03392v1
-
-**Abstract:**
-> In federated learning (FL), heterogeneity among the local dataset
-> distributions of clients can result in unsatisfactory performance for some,
-> leading to an unfair model. To address this challenge, we propose an
-> over-the-air fair federated learning algorithm (OTA-FFL), which leverages
-> over-the-air computation to train fair FL models. By formulating FL as a
-> multi-objective minimization problem, we introduce a modified Chebyshev
-> approach to compute adaptive weighting coefficients for gradient aggregation in
-> each communication round. To enable efficient aggregation over the multiple
-> access channel, we derive analytical solutions for the optimal transmit scalars
-> at the clients and the de-noising scalar at the parameter server. Extensive
-> experiments demonstrate the superiority of OTA-FFL in achieving fairness and
-> robust performance compared to existing methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on federated learning, multi-objective optimization, and over-the-air computation, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Enhanced Importance Sampling through Latent Space Exploration in
-  Normalizing Flows](https://arxiv.org/abs/http://arxiv.org/abs/2501.03394v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03394v1
-
-**Abstract:**
-> Importance sampling is a rare event simulation technique used in Monte Carlo
-> simulations to bias the sampling distribution towards the rare event of
-> interest. By assigning appropriate weights to sampled points, importance
-> sampling allows for more efficient estimation of rare events or tails of
-> distributions. However, importance sampling can fail when the proposal
-> distribution does not effectively cover the target distribution. In this work,
-> we propose a method for more efficient sampling by updating the proposal
-> distribution in the latent space of a normalizing flow. Normalizing flows learn
-> an invertible mapping from a target distribution to a simpler latent
-> distribution. The latent space can be more easily explored during the search
-> for a proposal distribution, and samples from the proposal distribution are
-> recovered in the space of the target distribution via the invertible mapping.
-> We empirically validate our methodology on simulated robotics applications such
-> as autonomous racing and aircraft ground collision avoidance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the criteria as it focuses on enhancing importance sampling in Monte Carlo simulations using normalizing flows, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing all 'MUST' criteria.
-
----
-
-## [BoundingDocs: a Unified Dataset for Document Question Answering with
-  Spatial Annotations](https://arxiv.org/abs/http://arxiv.org/abs/2501.03403v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03403v1
-
-**Abstract:**
-> We present a unified dataset for document Question-Answering (QA), which is
-> obtained combining several public datasets related to Document AI and visually
-> rich document understanding (VRDU). Our main contribution is twofold: on the
-> one hand we reformulate existing Document AI tasks, such as Information
-> Extraction (IE), into a Question-Answering task, making it a suitable resource
-> for training and evaluating Large Language Models; on the other hand, we
-> release the OCR of all the documents and include the exact position of the
-> answer to be found in the document image as a bounding box. Using this dataset,
-> we explore the impact of different prompting techniques (that might include
-> bounding box information) on the performance of open-weight models, identifying
-> the most effective approaches for document comprehension.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "Although the paper mentions exploring the impact of different prompting techniques on LLM performance, its primary focus is on presenting a unified dataset for document Question Answering, making prompt engineering a secondary concern rather than the central focus."
-}
-
----
-
-## [SALT: Sales Autocompletion Linked Business Tables Dataset](https://arxiv.org/abs/http://arxiv.org/abs/2501.03413v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03413v1
-
-**Abstract:**
-> Foundation models, particularly those that incorporate Transformer
-> architectures, have demonstrated exceptional performance in domains such as
-> natural language processing and image processing. Adapting these models to
-> structured data, like tables, however, introduces significant challenges. These
-> difficulties are even more pronounced when addressing multi-table data linked
-> via foreign key, which is prevalent in the enterprise realm and crucial for
-> empowering business use cases. Despite its substantial impact, research
-> focusing on such linked business tables within enterprise settings remains a
-> significantly important yet underexplored domain. To address this, we introduce
-> a curated dataset sourced from an Enterprise Resource Planning (ERP) system,
-> featuring extensive linked tables. This dataset is specifically designed to
-> support research endeavors in table representation learning. By providing
-> access to authentic enterprise data, our goal is to potentially enhance the
-> effectiveness and applicability of models for real-world business contexts.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the 'MUST' criteria, as its primary focus is on introducing a dataset for table representation learning in enterprise settings, rather than engineering, designing, or optimizing prompts specifically for Large Language Models (LLMs), and does not demonstrate the impact of textual input prompts on LLM output."
-}
-
----
-
-## [Optimization Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.03443v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03443v1
-
-**Abstract:**
-> This article introduces the concept of optimization learning, a methodology
-> to design optimization proxies that learn the input/output mapping of
-> parametric optimization problems. These optimization proxies are trustworthy by
-> design: they compute feasible solutions to the underlying optimization
-> problems, provide quality guarantees on the returned solutions, and scale to
-> large instances. Optimization proxies are differentiable programs that combine
-> traditional deep learning technology with repair or completion layers to
-> produce feasible solutions. The article shows that optimization proxies can be
-> trained end-to-end in a self-supervised way. It presents methodologies to
-> provide performance guarantees and to scale optimization proxies to large-scale
-> optimization problems. The potential of optimization proxies is highlighted
-> through applications in power systems and, in particular, real-time risk
-> assessment and security-constrained optimal power flow.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering, design, or optimization for Large Language Models (LLMs). Instead, it introduces 'optimization learning' for solving parametric optimization problems, unrelated to LLMs or text-based interactions.
-
----
-
-## [LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification
-  and Tagging](https://arxiv.org/abs/http://arxiv.org/abs/2501.03464v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03464v1
-
-**Abstract:**
-> Transformers have set new benchmarks in audio processing tasks, leveraging
-> self-attention mechanisms to capture complex patterns and dependencies within
-> audio data. However, their focus on pairwise interactions limits their ability
-> to process the higher-order relations essential for identifying distinct audio
-> objects. To address this limitation, this work introduces the Local- Higher
-> Order Graph Neural Network (LHGNN), a graph based model that enhances feature
-> understanding by integrating local neighbourhood information with higher-order
-> data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio
-> relationships. Evaluation of the model on three publicly available audio
-> datasets shows that it outperforms Transformer-based models across all
-> benchmarks while operating with substantially fewer parameters. Moreover, LHGNN
-> demonstrates a distinct advantage in scenarios lacking ImageNet pretraining,
-> establishing its effectiveness and efficiency in environments where extensive
-> pretraining data is unavailable.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a new graph neural network model (LHGNN) for audio classification and tagging, primarily dealing with audio data and disregarding Large Language Models (LLMs) and text-based prompt engineering, thus failing to meet all the 'MUST' criteria.
-
----
-
-## [MTRAG: A Multi-Turn Conversational Benchmark for Evaluating
-  Retrieval-Augmented Generation Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.03468v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03468v1
-
-**Abstract:**
-> Retrieval-augmented generation (RAG) has recently become a very popular task
-> for Large Language Models (LLMs). Evaluating them on multi-turn RAG
-> conversations, where the system is asked to generate a response to a question
-> in the context of a preceding conversation is an important and often overlooked
-> task with several additional challenges. We present MTRAG: an end-to-end
-> human-generated multi-turn RAG benchmark that reflects several real-world
-> properties across diverse dimensions for evaluating the full RAG pipeline.
-> MTRAG contains 110 conversations averaging 7.7 turns each across four domains
-> for a total of 842 tasks. We also explore automation paths via synthetic data
-> and LLM-as-a-Judge evaluation. Our human and automatic evaluations show that
-> even state-of-the-art LLM RAG systems struggle on MTRAG. We demonstrate the
-> need for strong retrieval and generation systems that can handle later turns,
-> unanswerable questions, non-standalone questions, and multiple domains. MTRAG
-> is available at https://github.com/ibm/mt-rag-benchmark.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on introducing a benchmark (MTRAG) for evaluating Retrieval-Augmented Generation Systems, rather than specifically focusing on the engineering, design, or optimization of prompts for Large Language Models (LLMs) and demonstrating their impact on LLM output.
-
----
-
-## [Reading with Intent -- Neutralizing Intent](https://arxiv.org/abs/http://arxiv.org/abs/2501.03475v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03475v1
-
-**Abstract:**
-> Queries to large language models (LLMs) can be divided into two parts: the
-> instruction/question and the accompanying context. The context for
-> retrieval-augmented generation (RAG) systems in most benchmarks comes from
-> Wikipedia or Wikipedia-like texts which are written in a neutral and factual
-> tone. However, when RAG systems retrieve internet-based content, they encounter
-> text with diverse tones and linguistic styles, introducing challenges for
-> downstream tasks. The Reading with Intent task addresses this issue by
-> evaluating how varying tones in context passages affect model performance.
-> Building on prior work that focused on sarcasm, we extend this paradigm by
-> constructing a dataset where context passages are transformed to $11$ distinct
-> emotions using a better synthetic data generation approach. Using this dataset,
-> we train an emotion translation model to systematically adapt passages to
-> specified emotional tones. The human evaluation shows that the LLM fine-tuned
-> to become the emotion-translator benefited from the synthetically generated
-> data. Finally, the emotion-translator is used in the Reading with Intent task
-> to transform the passages to a neutral tone. By neutralizing the passages, it
-> mitigates the challenges posed by sarcastic passages and improves overall
-> results on this task by about $3\%$.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on training an emotion translation model to adapt passages to specified emotional tones and fine-tuning an LLM for emotion translation, rather than the engineering, design, or optimization of prompts specifically for LLMs. While it mentions improving results on the Reading with Intent task by neutralizing passages, the core subject is emotion adaptation in LLM training, not prompt engineering for text-based interactions with LLMs."
-}
-
----
-
-## [Align-Pro: A Principled Approach to Prompt Optimization for LLM
-  Alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.03486v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03486v1
-
-**Abstract:**
-> The alignment of large language models (LLMs) with human values is critical
-> as these models become increasingly integrated into various societal and
-> decision-making processes. Traditional methods, such as reinforcement learning
-> from human feedback (RLHF), achieve alignment by fine-tuning model parameters,
-> but these approaches are often computationally expensive and impractical when
-> models are frozen or inaccessible for parameter modification. In contrast,
-> prompt optimization is a viable alternative to RLHF for LLM alignment. While
-> the existing literature has shown empirical promise of prompt optimization, its
-> theoretical underpinning remains under-explored. We address this gap by
-> formulating prompt optimization as an optimization problem and try to provide
-> theoretical insights into the optimality of such a framework. To analyze the
-> performance of the prompt optimization, we study theoretical suboptimality
-> bounds and provide insights in terms of how prompt optimization depends upon
-> the given prompter and target model. We also provide empirical validation
-> through experiments on various datasets, demonstrating that prompt optimization
-> can effectively align LLMs, even when parameter fine-tuning is not feasible.
-
-**Decision Explanation:**
-Original decision: REJECT
-Although the paper discusses prompt optimization for LLM alignment, its primary focus is on LLM alignment through a theoretical optimization framework, rather than specifically on engineering, design, or optimization of prompts for improving LLM performance through textual input manipulation, as required.
-
----
-
-## [Can Deep Learning Trigger Alerts from Mobile-Captured Images?](https://arxiv.org/abs/http://arxiv.org/abs/2501.03499v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03499v1
-
-**Abstract:**
-> Our research presents a comprehensive approach to leveraging mobile camera
-> image data for real-time air quality assessment and recommendation. We develop
-> a regression-based Convolutional Neural Network model and tailor it explicitly
-> for air quality prediction by exploiting the inherent relationship between
-> output parameters. As a result, the Mean Squared Error of 0.0077 and 0.0112
-> obtained for 2 and 5 pollutants respectively outperforms existing models.
-> Furthermore, we aim to verify the common practice of augmenting the original
-> dataset with a view to introducing more variation in the training phase. It is
-> one of our most significant contributions that our experimental results
-> demonstrate minimal accuracy differences between the original and augmented
-> datasets. Finally, a real-time, user-friendly dashboard is implemented which
-> dynamically displays the Air Quality Index and pollutant values derived from
-> captured mobile camera images. Users' health conditions are considered to
-> recommend whether a location is suitable based on current air quality metrics.
-> Overall, this research contributes to verification of data augmentation
-> techniques, CNN-based regression modelling for air quality prediction, and
-> user-centric air quality monitoring through mobile technology. The proposed
-> system offers practical solutions for individuals to make informed
-> environmental health and well-being decisions.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on leveraging mobile camera images for air quality assessment using a Convolutional Neural Network model, with no mention of Large Language Models (LLMs), prompt engineering, or text-based interactions, thereby failing to meet all 'MUST' criteria.
-
----
-
-## [Vocal Tract Length Warped Features for Spoken Keyword Spotting](https://arxiv.org/abs/http://arxiv.org/abs/2501.03523v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03523v1
-
-**Abstract:**
-> In this paper, we propose several methods that incorporate vocal tract length
-> (VTL) warped features for spoken keyword spotting (KWS). The first method,
-> VTL-independent KWS, involves training a single deep neural network (DNN) that
-> utilizes VTL features with various warping factors. During training, a specific
-> VTL feature is randomly selected per epoch, allowing the exploration of VTL
-> variations. During testing, the VTL features with different warping factors of
-> a test utterance are scored against the DNN and combined with equal weight. In
-> the second method scores the conventional features of a test utterance (without
-> VTL warping) against the DNN. The third method, VTL-concatenation KWS,
-> concatenates VTL warped features to form high-dimensional features for KWS.
-> Evaluations carried out on the English Google Command dataset demonstrate that
-> the proposed methods improve the accuracy of KWS.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on spoken keyword spotting using vocal tract length warped features with deep neural networks, which does not meet the criteria of primarily focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or investigating methods for improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Deep Learning within Tabular Data: Foundations, Challenges, Advances and
-  Future Directions](https://arxiv.org/abs/http://arxiv.org/abs/2501.03540v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03540v1
-
-**Abstract:**
-> Tabular data remains one of the most prevalent data types across a wide range
-> of real-world applications, yet effective representation learning for this
-> domain poses unique challenges due to its irregular patterns, heterogeneous
-> feature distributions, and complex inter-column dependencies. This survey
-> provides a comprehensive review of state-of-the-art techniques in tabular data
-> representation learning, structured around three foundational design elements:
-> training data, neural architectures, and learning objectives. Unlike prior
-> surveys that focus primarily on either architecture design or learning
-> strategies, we adopt a holistic perspective that emphasizes the universality
-> and robustness of representation learning methods across diverse downstream
-> tasks. We examine recent advances in data augmentation and generation,
-> specialized neural network architectures tailored to tabular data, and
-> innovative learning objectives that enhance representation quality.
-> Additionally, we highlight the growing influence of self-supervised learning
-> and the adaptation of transformer-based foundation models for tabular data. Our
-> review is based on a systematic literature search using rigorous inclusion
-> criteria, encompassing 127 papers published since 2020 in top-tier conferences
-> and journals. Through detailed analysis and comparison, we identify emerging
-> trends, critical gaps, and promising directions for future research, aiming to
-> guide the development of more generalizable and effective tabular data
-> representation methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on tabular data representation learning using deep learning techniques, with no apparent connection to Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for
-  Text-to-Image Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03544v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03544v1
-
-**Abstract:**
-> Text-to-image (T2I) models have been shown to be vulnerable to misuse,
-> particularly in generating not-safe-for-work (NSFW) content, raising serious
-> ethical concerns. In this work, we present PromptGuard, a novel content
-> moderation technique that draws inspiration from the system prompt mechanism in
-> large language models (LLMs) for safety alignment. Unlike LLMs, T2I models lack
-> a direct interface for enforcing behavioral guidelines. Our key idea is to
-> optimize a safety soft prompt that functions as an implicit system prompt
-> within the T2I model's textual embedding space. This universal soft prompt (P*)
-> directly moderates NSFW inputs, enabling safe yet realistic image generation
-> without altering the inference efficiency or requiring proxy models. Extensive
-> experiments across three datasets demonstrate that PromptGuard effectively
-> mitigates NSFW content generation while preserving high-quality benign outputs.
-> PromptGuard achieves 7.8 times faster than prior content moderation methods,
-> surpassing eight state-of-the-art defenses with an optimal unsafe ratio down to
-> 5.84%.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on text-to-image models and content moderation, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) as required, violating the 'MUST NOT' criteria concerning applications of generative AI other than text generation driven by LLMs.
-
----
-
-## [Rethinking Adversarial Attacks in Reinforcement Learning from Policy
-  Distribution Perspective](https://arxiv.org/abs/http://arxiv.org/abs/2501.03562v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03562v2
-
-**Abstract:**
-> Deep Reinforcement Learning (DRL) suffers from uncertainties and inaccuracies
-> in the observation signal in realworld applications. Adversarial attack is an
-> effective method for evaluating the robustness of DRL agents. However, existing
-> attack methods targeting individual sampled actions have limited impacts on the
-> overall policy distribution, particularly in continuous action spaces. To
-> address these limitations, we propose the Distribution-Aware Projected Gradient
-> Descent attack (DAPGD). DAPGD uses distribution similarity as the gradient
-> perturbation input to attack the policy network, which leverages the entire
-> policy distribution rather than relying on individual samples. We utilize the
-> Bhattacharyya distance in DAPGD to measure policy similarity, enabling
-> sensitive detection of subtle but critical differences between probability
-> distributions. Our experiment results demonstrate that DAPGD achieves SOTA
-> results compared to the baselines in three robot navigation tasks, achieving an
-> average 22.03% higher reward drop compared to the best baseline.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria: it focuses on Reinforcement Learning, adversarial attacks, and policy distribution, with no primary emphasis on engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Applying Large Language Models in Knowledge Graph-based Enterprise
-  Modeling: Challenges and Opportunities](https://arxiv.org/abs/http://arxiv.org/abs/2501.03566v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03566v1
-
-**Abstract:**
-> The role of large language models (LLMs) in enterprise modeling has recently
-> started to shift from academic research to that of industrial applications.
-> Thereby, LLMs represent a further building block for the machine-supported
-> generation of enterprise models. In this paper we employ a knowledge
-> graph-based approach for enterprise modeling and investigate the potential
-> benefits of LLMs in this context. In addition, the findings of an expert survey
-> and ChatGPT-4o-based experiments demonstrate that LLM-based model generations
-> exhibit minimal variability, yet remain constrained to specific tasks, with
-> reliability declining for more intricate tasks. The survey results further
-> suggest that the supervision and intervention of human modeling experts are
-> essential to ensure the accuracy and integrity of the generated models.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is on applying LLMs in enterprise modeling using knowledge graphs, not on prompt engineering for text-based interactions with LLMs, failing to meet the core subject requirement."
-}
-
----
-
-## [RecKG: Knowledge Graph for Recommender Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.03598v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03598v1
-
-**Abstract:**
-> Knowledge graphs have proven successful in integrating heterogeneous data
-> across various domains. However, there remains a noticeable dearth of research
-> on their seamless integration among heterogeneous recommender systems, despite
-> knowledge graph-based recommender systems garnering extensive research
-> attention. This study aims to fill this gap by proposing RecKG, a standardized
-> knowledge graph for recommender systems. RecKG ensures the consistent
-> representation of entities across different datasets, accommodating diverse
-> attribute types for effective data integration. Through a meticulous
-> examination of various recommender system datasets, we select attributes for
-> RecKG, ensuring standardized formatting through consistent naming conventions.
-> By these characteristics, RecKG can seamlessly integrate heterogeneous data
-> sources, enabling the discovery of additional semantic information within the
-> integrated knowledge graph. We apply RecKG to standardize real-world datasets,
-> subsequently developing an application for RecKG using a graph database.
-> Finally, we validate RecKG's achievement in interoperability through a
-> qualitative evaluation between RecKG and other studies.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on integrating knowledge graphs for recommender systems, not on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria 1 and 2.
-
----
-
-## [MHGNet: Multi-Heterogeneous Graph Neural Network for Traffic Prediction](https://arxiv.org/abs/http://arxiv.org/abs/2501.03635v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03635v1
-
-**Abstract:**
-> In recent years, traffic flow prediction has played a crucial role in the
-> management of intelligent transportation systems. However, traditional
-> forecasting methods often model non-Euclidean low-dimensional traffic data as a
-> simple graph with single-type nodes and edges, failing to capture similar
-> trends among nodes of the same type. To address this limitation, this paper
-> proposes MHGNet, a novel framework for modeling spatiotemporal
-> multi-heterogeneous graphs. Within this framework, the STD Module decouples
-> single-pattern traffic data into multi-pattern traffic data through feature
-> mappings of timestamp embedding matrices and node embedding matrices.
-> Subsequently, the Node Clusterer leverages the Euclidean distance between nodes
-> and different types of limit points to perform clustering with O(N) time
-> complexity. The nodes within each cluster undergo residual subgraph convolution
-> within the spatiotemporal fusion subgraphs generated by the DSTGG Module,
-> followed by processing in the SIE Module for node repositioning and
-> redistribution of weights. To validate the effectiveness of MHGNet, this paper
-> conducts extensive ablation studies and quantitative evaluations on four widely
-> used benchmarks, demonstrating its superior performance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary focus criteria, as it concentrates on traffic prediction using Multi-Heterogeneous Graph Neural Networks, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to satisfy the 'MUST' criteria.
-
----
-
-## [Effective and Efficient Mixed Precision Quantization of Speech
-  Foundation Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03643v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03643v2
-
-**Abstract:**
-> This paper presents a novel mixed-precision quantization approach for speech
-> foundation models that tightly integrates mixed-precision learning and
-> quantized model parameter estimation into one single model compression stage.
-> Experiments conducted on LibriSpeech dataset with fine-tuned wav2vec2.0-base
-> and HuBERT-large models suggest the resulting mixed-precision quantized models
-> increased the lossless compression ratio by factors up to 1.7x and 1.9x over
-> the respective uniform-precision and two-stage mixed-precision quantized
-> baselines that perform precision learning and model parameters quantization in
-> separate and disjointed stages, while incurring no statistically word error
-> rate (WER) increase over the 32-bit full-precision models. The system
-> compression time of wav2vec2.0-base and HuBERT-large models is reduced by up to
-> 1.9 and 1.5 times over the two-stage mixed-precision baselines, while both
-> produce lower WERs. The best-performing 3.5-bit mixed-precision quantized
-> HuBERT-large model produces a lossless compression ratio of 8.6x over the
-> 32-bit full-precision system.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on model compression through mixed-precision quantization of speech foundation models, not on prompt engineering or the manipulation of textual input prompts for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria."
-}
-
----
-
-## [A Diversity-Enhanced Knowledge Distillation Model for Practical Math
-  Word Problem Solving](https://arxiv.org/abs/http://arxiv.org/abs/2501.03670v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03670v1
-
-**Abstract:**
-> Math Word Problem (MWP) solving is a critical task in natural language
-> processing, has garnered significant research interest in recent years. Various
-> recent studies heavily rely on Seq2Seq models and their extensions (e.g.,
-> Seq2Tree and Graph2Tree) to generate mathematical equations. While effective,
-> these models struggle to generate diverse but counterpart solution equations,
-> limiting their generalization across various math problem scenarios. In this
-> paper, we introduce a novel Diversity-enhanced Knowledge Distillation (DivKD)
-> model for practical MWP solving. Our approach proposes an adaptive diversity
-> distillation method, in which a student model learns diverse equations by
-> selectively transferring high-quality knowledge from a teacher model.
-> Additionally, we design a diversity prior-enhanced student model to better
-> capture the diversity distribution of equations by incorporating a conditional
-> variational auto-encoder. Extensive experiments on {four} MWP benchmark
-> datasets demonstrate that our approach achieves higher answer accuracy than
-> strong baselines while maintaining high efficiency for practical applications.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses primarily on developing a new model (Diversity-enhanced Knowledge Distillation) for math word problem solving, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs). Prompt engineering is not the central concern, and the paper does not demonstrate the impact of textual input prompts on LLM output."
-}
-
----
-
-## [SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks](https://arxiv.org/abs/http://arxiv.org/abs/2501.03676v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03676v2
-
-**Abstract:**
-> In this work, we build upon the offline reinforcement learning algorithm TD7,
-> which incorporates State-Action Learned Embeddings (SALE) and a prioritized
-> experience replay buffer (LAP). We propose a model-free actor-critic algorithm
-> that integrates ensemble Q-networks and a gradient diversity penalty from EDAC.
-> The ensemble Q-networks introduce penalties to guide the actor network toward
-> in-distribution actions, effectively addressing the challenge of
-> out-of-distribution actions. Meanwhile, the gradient diversity penalty
-> encourages diverse Q-value gradients, further suppressing overestimation for
-> out-of-distribution actions. Additionally, our method retains an adjustable
-> behavior cloning (BC) term that directs the actor network toward dataset
-> actions during early training stages, while gradually reducing its influence as
-> the precision of the Q-ensemble improves. These enhancements work
-> synergistically to improve the stability and precision of the training.
-> Experimental results on the D4RL MuJoCo benchmarks demonstrate that our
-> algorithm achieves higher convergence speed, stability, and performance
-> compared to existing methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on offline reinforcement learning with ensemble Q-networks, disregarding the mandatory criteria of concentrating on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and manipulating textual input prompts to improve LLM performance.
-
----
-
-## [SLAM: Towards Efficient Multilingual Reasoning via Selective Language
-  Alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.03681v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03681v1
-
-**Abstract:**
-> Despite the significant improvements achieved by large language models (LLMs)
-> in English reasoning tasks, these models continue to struggle with multilingual
-> reasoning. Recent studies leverage a full-parameter and two-stage training
-> paradigm to teach models to first understand non-English questions and then
-> reason. However, this method suffers from both substantial computational
-> resource computing and catastrophic forgetting. The fundamental cause is that,
-> with the primary goal of enhancing multilingual comprehension, an excessive
-> number of irrelevant layers and parameters are tuned during the first stage.
-> Given our findings that the representation learning of languages is merely
-> conducted in lower-level layers, we propose an efficient multilingual reasoning
-> alignment approach that precisely identifies and fine-tunes the layers
-> responsible for handling multilingualism. Experimental results show that our
-> method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of
-> all parameters within 7B and 13B LLMs, achieving superior average performance
-> than all strong baselines across 10 languages. Meanwhile, SLAM only involves
-> one training stage, reducing training time by 4.1-11.9 compared to the
-> two-stage method.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on efficient multilingual reasoning via selective language alignment, involving a one-stage training method for LLMs, which aligns more with the development of new training methods (violation of MUST NOT 1) rather than the engineering, design, or optimization of prompts for Large Language Models.
-
----
-
-## [Exploring Molecule Generation Using Latent Space Graph Diffusion](https://arxiv.org/abs/http://arxiv.org/abs/2501.03696v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03696v1
-
-**Abstract:**
-> Generating molecular graphs is a challenging task due to their discrete
-> nature and the competitive objectives involved. Diffusion models have emerged
-> as SOTA approaches in data generation across various modalities. For molecular
-> graphs, graph neural networks (GNNs) as a diffusion backbone have achieved
-> impressive results. Latent space diffusion, where diffusion occurs in a
-> low-dimensional space via an autoencoder, has demonstrated computational
-> efficiency. However, the literature on latent space diffusion for molecular
-> graphs is scarce, and no commonly accepted best practices exist. In this work,
-> we explore different approaches and hyperparameters, contrasting generative
-> flow models (denoising diffusion, flow matching, heat dissipation) and
-> architectures (GNNs and E(3)-equivariant GNNs). Our experiments reveal a high
-> sensitivity to the choice of approach and design decisions. Code is made
-> available at
-> github.com/Prashanth-Pombala/Molecule-Generation-using-Latent-Space-Graph-Diffusion.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on generating molecular graphs using latent space graph diffusion, involving graph neural networks and diffusion models, without any mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet all 'MUST' criteria."
-}
-
----
-
-## [Unsupervised Speech Segmentation: A General Approach Using Speech
-  Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03711v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03711v1
-
-**Abstract:**
-> In this paper, we introduce an unsupervised approach for Speech Segmentation,
-> which builds on previously researched approaches, e.g., Speaker Diarization,
-> while being applicable to an inclusive set of acoustic-semantic distinctions,
-> paving a path towards a general Unsupervised Speech Segmentation approach.
-> Unlike traditional speech and audio segmentation, which mainly focuses on
-> spectral changes in the input signal, e.g., phone segmentation, our approach
-> tries to segment the spoken utterance into chunks with differing
-> acoustic-semantic styles, focusing on acoustic-semantic information that does
-> not translate well into text, e.g., emotion or speaker. While most Speech
-> Segmentation tasks only handle one style change, e.g., emotion diarization, our
-> approach tries to handle multiple acoustic-semantic style changes. Leveraging
-> recent advances in Speech Language Models (SLMs), we propose a simple
-> unsupervised method to segment a given speech utterance. We empirically
-> demonstrate the effectiveness of the proposed approach by considering several
-> setups. Results suggest that the proposed method is superior to the evaluated
-> baselines on boundary detection, segment purity, and over-segmentation. Code is
-> available at
-> https://github.com/avishaiElmakies/unsupervised_speech_segmentation_using_slm.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on unsupervised speech segmentation using Speech Language Models (SLMs), which does not meet the criteria of primarily focusing on the engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts for improving LLM performance.
-
----
-
-## [Materialist: Physically Based Editing Using Single-Image Inverse
-  Rendering](https://arxiv.org/abs/http://arxiv.org/abs/2501.03717v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03717v1
-
-**Abstract:**
-> To perform image editing based on single-view, inverse physically based
-> rendering, we present a method combining a learning-based approach with
-> progressive differentiable rendering. Given an image, our method leverages
-> neural networks to predict initial material properties. Progressive
-> differentiable rendering is then used to optimize the environment map and
-> refine the material properties with the goal of closely matching the rendered
-> result to the input image. We require only a single image while other inverse
-> rendering methods based on the rendering equation require multiple views. In
-> comparison to single-view methods that rely on neural renderers, our approach
-> achieves more realistic light material interactions, accurate shadows, and
-> global illumination. Furthermore, with optimized material properties and
-> illumination, our method enables a variety of tasks, including physically based
-> material editing, object insertion, and relighting. We also propose a method
-> for material transparency editing that operates effectively without requiring
-> full scene geometry. Compared with methods based on Stable Diffusion, our
-> approach offers stronger interpretability and more realistic light refraction
-> based on empirical results.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on image editing using single-image inverse physically based rendering, involving neural networks and differentiable rendering for image generation, which falls under image generation and not text-based interactions with Large Language Models (LLMs), thus failing to meet the primary criteria of focusing on prompt engineering for LLMs.
-
----
-
-## [SelectiveFinetuning: Enhancing Transfer Learning in Sleep Staging
-  through Selective Domain Alignment](https://arxiv.org/abs/http://arxiv.org/abs/2501.03764v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03764v1
-
-**Abstract:**
-> In practical sleep stage classification, a key challenge is the variability
-> of EEG data across different subjects and environments. Differences in
-> physiology, age, health status, and recording conditions can lead to domain
-> shifts between data. These domain shifts often result in decreased model
-> accuracy and reliability, particularly when the model is applied to new data
-> with characteristics different from those it was originally trained on, which
-> is a typical manifestation of negative transfer. To address this, we propose
-> SelectiveFinetuning in this paper. Our method utilizes a pretrained Multi
-> Resolution Convolutional Neural Network (MRCNN) to extract EEG features,
-> capturing the distinctive characteristics of different sleep stages. To
-> mitigate the effect of domain shifts, we introduce a domain aligning mechanism
-> that employs Earth Mover Distance (EMD) to evaluate and select source domain
-> data closely matching the target domain. By finetuning the model with selective
-> source data, our SelectiveFinetuning enhances the model's performance on target
-> domain that exhibits domain shifts compared to the data used for training.
-> Experimental results show that our method outperforms existing baselines,
-> offering greater robustness and adaptability in practical scenarios where data
-> distributions are often unpredictable.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on enhancing transfer learning in sleep staging through selective domain alignment using a Convolutional Neural Network (CNN), not on the engineering, design, or optimization of prompts for Large Language Models (LLMs), and does not investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Three-dimensional attention Transformer for state evaluation in
-  real-time strategy games](https://arxiv.org/abs/http://arxiv.org/abs/2501.03832v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03832v1
-
-**Abstract:**
-> Situation assessment in Real-Time Strategy (RTS) games is crucial for
-> understanding decision-making in complex adversarial environments. However,
-> existing methods remain limited in processing multi-dimensional feature
-> information and temporal dependencies. Here we propose a tri-dimensional
-> Space-Time-Feature Transformer (TSTF Transformer) architecture, which
-> efficiently models battlefield situations through three independent but
-> cascaded modules: spatial attention, temporal attention, and feature attention.
-> On a dataset comprising 3,150 adversarial experiments, the 8-layer TSTF
-> Transformer demonstrates superior performance: achieving 58.7% accuracy in the
-> early game (~4% progress), significantly outperforming the conventional
-> Timesformer's 41.8%; reaching 97.6% accuracy in the mid-game (~40% progress)
-> while maintaining low performance variation (standard deviation 0.114).
-> Meanwhile, this architecture requires fewer parameters (4.75M) compared to the
-> baseline model (5.54M). Our study not only provides new insights into situation
-> assessment in RTS games but also presents an innovative paradigm for
-> Transformer-based multi-dimensional temporal modeling.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a new Transformer architecture for situation assessment in Real-Time Strategy games, rather than on prompt engineering for Large Language Models (LLMs). It does not investigate, analyze, or propose methods for improving LLM performance through textual input prompt manipulation, and there is no mention of prompts or LLMs in the provided abstract.
-
----
-
-## [SCC-YOLO: An Improved Object Detector for Assisting in Brain Tumor
-  Diagnosis](https://arxiv.org/abs/http://arxiv.org/abs/2501.03836v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03836v2
-
-**Abstract:**
-> Brain tumors can result in neurological dysfunction, alterations in cognitive
-> and psychological states, increased intracranial pressure, and the occurrence
-> of seizures, thereby presenting a substantial risk to human life and health.
-> The You Only Look Once(YOLO) series models have demonstrated superior accuracy
-> in object detection for medical imaging. In this paper, we develop a novel
-> SCC-YOLO architecture by integrating the SCConv attention mechanism into
-> YOLOv9. The SCConv module reconstructs an efficient convolutional module by
-> reducing spatial and channel redundancy among features, thereby enhancing the
-> learning of image features. We investigate the impact of intergrating different
-> attention mechanisms with the YOLOv9 model on brain tumor image detection using
-> both the Br35H dataset and our self-made dataset(Brain_Tumor_Dataset).
-> Experimental results show that on the Br35H dataset, SCC-YOLO achieved a 0.3%
-> improvement in mAp50 compared to YOLOv9, while on our self-made dataset,
-> SCC-YOLO exhibited a 0.5% improvement over YOLOv9. SCC-YOLO has reached
-> state-of-the-art performance in brain tumor detection. Source code is available
-> at : https://jihulab.com/healthcare-information-studio/SCC-YOLO/-/tree/master
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a new object detection architecture (SCC-YOLO) for medical imaging (brain tumor diagnosis) and does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through the manipulation of textual input prompts.
-
----
-
-## [Explainable Reinforcement Learning via Temporal Policy Decomposition](https://arxiv.org/abs/http://arxiv.org/abs/2501.03902v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03902v1
-
-**Abstract:**
-> We investigate the explainability of Reinforcement Learning (RL) policies
-> from a temporal perspective, focusing on the sequence of future outcomes
-> associated with individual actions. In RL, value functions compress information
-> about rewards collected across multiple trajectories and over an infinite
-> horizon, allowing a compact form of knowledge representation. However, this
-> compression obscures the temporal details inherent in sequential
-> decision-making, presenting a key challenge for interpretability. We present
-> Temporal Policy Decomposition (TPD), a novel explainability approach that
-> explains individual RL actions in terms of their Expected Future Outcome (EFO).
-> These explanations decompose generalized value functions into a sequence of
-> EFOs, one for each time step up to a prediction horizon of interest, revealing
-> insights into when specific outcomes are expected to occur. We leverage
-> fixed-horizon temporal difference learning to devise an off-policy method for
-> learning EFOs for both optimal and suboptimal actions, enabling contrastive
-> explanations consisting of EFOs for different state-action pairs. Our
-> experiments demonstrate that TPD generates accurate explanations that (i)
-> clarify the policy's future strategy and anticipated trajectory for a given
-> action and (ii) improve understanding of the reward composition, facilitating
-> fine-tuning of the reward function to align with human expectations.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on explainability in Reinforcement Learning (RL) policies, primarily dealing with temporal policy decomposition and value functions, with no indication of prompt engineering for Large Language Models (LLMs) or manipulation of textual input prompts to improve LLM performance."
-}
-
----
-
-## [Exploring the Potential of Large Language Models in Public
-  Transportation: San Antonio Case Study](https://arxiv.org/abs/http://arxiv.org/abs/2501.03904v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03904v1
-
-**Abstract:**
-> The integration of large language models (LLMs) into public transit systems
-> presents a transformative opportunity to enhance urban mobility. This study
-> explores the potential of LLMs to revolutionize public transportation
-> management within the context of San Antonio's transit system. Leveraging the
-> capabilities of LLMs in natural language processing and data analysis, we
-> investigate their capabilities to optimize route planning, reduce wait times,
-> and provide personalized travel assistance. By utilizing the General Transit
-> Feed Specification (GTFS) and other relevant data, this research aims to
-> demonstrate how LLMs can potentially improve resource allocation, elevate
-> passenger satisfaction, and inform data-driven decision-making in transit
-> operations. A comparative analysis of different ChatGPT models was conducted to
-> assess their ability to understand transportation information, retrieve
-> relevant data, and provide comprehensive responses. Findings from this study
-> suggest that while LLMs hold immense promise for public transit, careful
-> engineering and fine-tuning are essential to realizing their full potential.
-> San Antonio serves as a case study to inform the development of LLM-powered
-> transit systems in other urban environments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on applying Large Language Models to public transportation management, rather than on the engineering, design, or optimization of prompts specifically for LLMs, failing to meet the core subject requirement.
-
----
-
-## [Localizing AI: Evaluating Open-Weight Language Models for Languages of
-  Baltic States](https://arxiv.org/abs/http://arxiv.org/abs/2501.03952v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03952v1
-
-**Abstract:**
-> Although large language models (LLMs) have transformed our expectations of
-> modern language technologies, concerns over data privacy often restrict the use
-> of commercially available LLMs hosted outside of EU jurisdictions. This limits
-> their application in governmental, defence, and other data-sensitive sectors.
-> In this work, we evaluate the extent to which locally deployable open-weight
-> LLMs support lesser-spoken languages such as Lithuanian, Latvian, and Estonian.
-> We examine various size and precision variants of the top-performing
-> multilingual open-weight models, Llama~3, Gemma~2, Phi, and NeMo, on machine
-> translation, multiple-choice question answering, and free-form text generation.
-> The results indicate that while certain models like Gemma~2 perform close to
-> the top commercially available models, many LLMs struggle with these languages.
-> Most surprisingly, however, we find that these models, while showing close to
-> state-of-the-art translation performance, are still prone to lexical
-> hallucinations with errors in at least 1 in 20 words for all open-weight
-> multilingual LLMs.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on evaluating open-weight Language Models for lesser-spoken languages, concerning data privacy and model performance, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs)."
-}
-
----
-
-## [VLM-driven Behavior Tree for Context-aware Task Planning](https://arxiv.org/abs/http://arxiv.org/abs/2501.03968v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03968v2
-
-**Abstract:**
-> The use of Large Language Models (LLMs) for generating Behavior Trees (BTs)
-> has recently gained attention in the robotics community, yet remains in its
-> early stages of development. In this paper, we propose a novel framework that
-> leverages Vision-Language Models (VLMs) to interactively generate and edit BTs
-> that address visual conditions, enabling context-aware robot operations in
-> visually complex environments. A key feature of our approach lies in the
-> conditional control through self-prompted visual conditions. Specifically, the
-> VLM generates BTs with visual condition nodes, where conditions are expressed
-> as free-form text. Another VLM process integrates the text into its prompt and
-> evaluates the conditions against real-world images during robot execution. We
-> validated our framework in a real-world cafe scenario, demonstrating both its
-> feasibility and limitations.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on leveraging Vision-Language Models (VLMs) for context-aware task planning in robotics, with prompt engineering being a secondary aspect used for conditional control, not the central focus of the paper.
-
----
-
-## [ChronoLLM: A Framework for Customizing Large Language Model for Digital
-  Twins generalization based on PyChrono](https://arxiv.org/abs/http://arxiv.org/abs/2501.04062v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04062v1
-
-**Abstract:**
-> Recently, the integration of advanced simulation technologies with artificial
-> intelligence (AI) is revolutionizing science and engineering research.
-> ChronoLlama introduces a novel framework that customizes the open-source LLMs,
-> specifically for code generation, paired with PyChrono for multi-physics
-> simulations. This integration aims to automate and improve the creation of
-> simulation scripts, thus enhancing model accuracy and efficiency. This
-> combination harnesses the speed of AI-driven code generation with the
-> reliability of physics-based simulations, providing a powerful tool for
-> researchers and engineers. Empirical results indicate substantial enhancements
-> in simulation setup speed, accuracy of the generated codes, and overall
-> computational efficiency. ChronoLlama not only expedites the development and
-> testing of multibody systems but also spearheads a scalable, AI-enhanced
-> approach to managing intricate mechanical simulations. This pioneering
-> integration of cutting-edge AI with traditional simulation platforms represents
-> a significant leap forward in automating and optimizing design processes in
-> engineering applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on integrating LLMs with multi-physics simulations for automating code generation in engineering applications, rather than specifically on prompt engineering for Large Language Models. The core subject is the development of a framework for simulation script automation, not novel prompt engineering techniques or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised
-  Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.04108v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04108v1
-
-**Abstract:**
-> An image encoder pre-trained by self-supervised learning can be used as a
-> general-purpose feature extractor to build downstream classifiers for various
-> downstream tasks. However, many studies showed that an attacker can embed a
-> trojan into an encoder such that multiple downstream classifiers built based on
-> the trojaned encoder simultaneously inherit the trojan behavior. In this work,
-> we propose TrojanDec, the first data-free method to identify and recover a test
-> input embedded with a trigger. Given a (trojaned or clean) encoder and a test
-> input, TrojanDec first predicts whether the test input is trojaned. If not, the
-> test input is processed in a normal way to maintain the utility. Otherwise, the
-> test input will be further restored to remove the trigger. Our extensive
-> evaluation shows that TrojanDec can effectively identify the trojan (if any)
-> from a given test input and recover it under state-of-the-art trojan attacks.
-> We further demonstrate by experiments that our TrojanDec outperforms the
-> state-of-the-art defenses.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on detecting and recovering from trojan attacks in self-supervised image encoders, which falls outside the specified criteria: it does not involve Large Language Models (LLMs), prompt engineering for text-based interactions, or text generation, and instead pertains to image processing and security.
-
----
-
-## [BiasGuard: Guardrailing Fairness in Machine Learning Production Systems](https://arxiv.org/abs/http://arxiv.org/abs/2501.04142v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04142v1
-
-**Abstract:**
-> As machine learning (ML) systems increasingly impact critical sectors such as
-> hiring, financial risk assessments, and criminal justice, the imperative to
-> ensure fairness has intensified due to potential negative implications. While
-> much ML fairness research has focused on enhancing training data and processes,
-> addressing the outputs of already deployed systems has received less attention.
-> This paper introduces 'BiasGuard', a novel approach designed to act as a
-> fairness guardrail in production ML systems. BiasGuard leverages Test-Time
-> Augmentation (TTA) powered by Conditional Generative Adversarial Network
-> (CTGAN), a cutting-edge generative AI model, to synthesize data samples
-> conditioned on inverted protected attribute values, thereby promoting equitable
-> outcomes across diverse groups. This method aims to provide equal opportunities
-> for both privileged and unprivileged groups while significantly enhancing the
-> fairness metrics of deployed systems without the need for retraining. Our
-> comprehensive experimental analysis across diverse datasets reveals that
-> BiasGuard enhances fairness by 31% while only reducing accuracy by 0.09%
-> compared to non-mitigated benchmarks. Additionally, BiasGuard outperforms
-> existing post-processing methods in improving fairness, positioning it as an
-> effective tool to safeguard against biases when retraining the model is
-> impractical.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on ensuring fairness in machine learning production systems via Test-Time Augmentation and a generative AI model, rather than specifically on prompt engineering for Large Language Models (LLMs) and the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Reasoning-Enhanced Self-Training for Long-Form Personalized Text
-  Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.04167v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04167v1
-
-**Abstract:**
-> Personalized text generation requires a unique ability of large language
-> models (LLMs) to learn from context that they often do not encounter during
-> their standard training. One way to encourage LLMs to better use personalized
-> context for generating outputs that better align with the user's expectations
-> is to instruct them to reason over the user's past preferences, background
-> knowledge, or writing style. To achieve this, we propose Reasoning-Enhanced
-> Self-Training for Personalized Text Generation (REST-PG), a framework that
-> trains LLMs to reason over personal data during response generation. REST-PG
-> first generates reasoning paths to train the LLM's reasoning abilities and then
-> employs Expectation-Maximization Reinforced Self-Training to iteratively train
-> the LLM based on its own high-reward outputs. We evaluate REST-PG on the
-> LongLaMP benchmark, consisting of four diverse personalized long-form text
-> generation tasks. Our experiments demonstrate that REST-PG achieves significant
-> improvements over state-of-the-art baselines, with an average relative
-> performance gain of 14.5% on the benchmark.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a new framework (REST-PG) for self-training LLMs to enhance personalized text generation, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models. It meets the exclusions as it's more about training methods for LLMs than prompt engineering.
-
----
-
-## [Learning to Transfer Human Hand Skills for Robot Manipulations](https://arxiv.org/abs/http://arxiv.org/abs/2501.04169v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04169v1
-
-**Abstract:**
-> We present a method for teaching dexterous manipulation tasks to robots from
-> human hand motion demonstrations. Unlike existing approaches that solely rely
-> on kinematics information without taking into account the plausibility of robot
-> and object interaction, our method directly infers plausible robot manipulation
-> actions from human motion demonstrations. To address the embodiment gap between
-> the human hand and the robot system, our approach learns a joint motion
-> manifold that maps human hand movements, robot hand actions, and object
-> movements in 3D, enabling us to infer one motion component from others. Our key
-> idea is the generation of pseudo-supervision triplets, which pair human,
-> object, and robot motion trajectories synthetically. Through real-world
-> experiments with robot hand manipulation, we demonstrate that our data-driven
-> retargeting method significantly outperforms conventional retargeting
-> techniques, effectively bridging the embodiment gap between human and robotic
-> hands. Website at https://rureadyo.github.io/MocapRobot/.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not focus on prompt engineering for Large Language Models (LLMs), instead concentrating on robotics and teaching manipulation tasks to robots from human hand motion demonstrations, which falls outside the specified criteria."
-}
-
----
-
-## [SNR-EQ-JSCC: Joint Source-Channel Coding with SNR-Based Embedding and
-  Query](https://arxiv.org/abs/http://arxiv.org/abs/2501.04732v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04732v1
-
-**Abstract:**
-> Coping with the impact of dynamic channels is a critical issue in joint
-> source-channel coding (JSCC)-based semantic communication systems. In this
-> paper, we propose a lightweight channel-adaptive semantic coding architecture
-> called SNR-EQ-JSCC. It is built upon the generic Transformer model and achieves
-> channel adaptation (CA) by Embedding the signal-to-noise ratio (SNR) into the
-> attention blocks and dynamically adjusting attention scores through
-> channel-adaptive Queries. Meanwhile, penalty terms are introduced in the loss
-> function to stabilize the training process. Considering that instantaneous SNR
-> feedback may be imperfect, we propose an alternative method that uses only the
-> average SNR, which requires no retraining of SNR-EQ-JSCC. Simulation results
-> conducted on image transmission demonstrate that the proposed SNR-EQJSCC
-> outperforms the state-of-the-art SwinJSCC in peak signal-to-noise ratio (PSNR)
-> and perception metrics while only requiring 0.05% of the storage overhead and
-> 6.38% of the computational complexity for CA. Moreover, the channel-adaptive
-> query method demonstrates significant improvements in perception metrics. When
-> instantaneous SNR feedback is imperfect, SNR-EQ-JSCC using only the average SNR
-> still surpasses baseline schemes.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on joint source-channel coding for semantic communication systems, adapting a Transformer model for channel conditions, and does not primarily investigate prompt engineering for Large Language Models (LLMs) or analyze methods for improving LLM performance through textual input prompt manipulation.
-
----
-
-## [RTLSquad: Multi-Agent Based Interpretable RTL Design](https://arxiv.org/abs/http://arxiv.org/abs/2501.05470v1)
-**arXiv ID:** http://arxiv.org/abs/2501.05470v1
-
-**Abstract:**
-> Optimizing Register-Transfer Level (RTL) code is crucial for improving
-> hardware PPA performance. Large Language Models (LLMs) offer new approaches for
-> automatic RTL code generation and optimization. However, existing methods often
-> lack decision interpretability (sufficient, understandable justification for
-> decisions), making it difficult for hardware engineers to trust the generated
-> results, thus preventing these methods from being integrated into the design
-> process. To address this, we propose RTLSquad, a novel LLM-Based Multi-Agent
-> system for interpretable RTL code generation. RTLSquad divides the design
-> process into exploration, implementation, and verification & evaluation stages
-> managed by specialized agent squads, generating optimized RTL code through
-> inter-agent collaboration, and providing decision interpretability through the
-> communication process. Experiments show that RTLSquad excels in generating
-> functionally correct RTL code and optimizing PPA performance, while also having
-> the capability to provide decision paths, demonstrating the practical value of
-> our system.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a multi-agent system for interpretable RTL (Register-Transfer Level) design using LLMs, rather than focusing on the engineering, design, or optimization of prompts specifically for Large Language Models. Prompt engineering is not the central concern, but rather a means to achieve the system's goals.
-
----
-
-## [Found in Translation: semantic approaches for enhancing AI
-  interpretability in face verification](https://arxiv.org/abs/http://arxiv.org/abs/2501.05471v1)
-**arXiv ID:** http://arxiv.org/abs/2501.05471v1
-
-**Abstract:**
-> The increasing complexity of machine learning models in computer vision,
-> particularly in face verification, requires the development of explainable
-> artificial intelligence (XAI) to enhance interpretability and transparency.
-> This study extends previous work by integrating semantic concepts derived from
-> human cognitive processes into XAI frameworks to bridge the comprehension gap
-> between model outputs and human understanding. We propose a novel approach
-> combining global and local explanations, using semantic features defined by
-> user-selected facial landmarks to generate similarity maps and textual
-> explanations via large language models (LLMs). The methodology was validated
-> through quantitative experiments and user feedback, demonstrating improved
-> interpretability. Results indicate that our semantic-based approach,
-> particularly the most detailed set, offers a more nuanced understanding of
-> model decisions than traditional methods. User studies highlight a preference
-> for our semantic explanations over traditional pixelbased heatmaps, emphasizing
-> the benefits of human-centric interpretability in AI. This work contributes to
-> the ongoing efforts to create XAI frameworks that align AI models behaviour
-> with human cognitive processes, fostering trust and acceptance in critical
-> applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing AI interpretability in face verification (a computer vision task) using Large Language Models (LLMs) as a component for generating textual explanations, rather than focusing on the engineering, design, or optimization of prompts specifically for LLMs in text-based interactions.
-
----
-
-## [Modality-Invariant Bidirectional Temporal Representation Distillation
-  Network for Missing Multimodal Sentiment Analysis](https://arxiv.org/abs/http://arxiv.org/abs/2501.05474v1)
-**arXiv ID:** http://arxiv.org/abs/2501.05474v1
-
-**Abstract:**
-> Multimodal Sentiment Analysis (MSA) integrates diverse modalities(text,
-> audio, and video) to comprehensively analyze and understand individuals'
-> emotional states. However, the real-world prevalence of incomplete data poses
-> significant challenges to MSA, mainly due to the randomness of modality
-> missing. Moreover, the heterogeneity issue in multimodal data has yet to be
-> effectively addressed. To tackle these challenges, we introduce the
-> Modality-Invariant Bidirectional Temporal Representation Distillation Network
-> (MITR-DNet) for Missing Multimodal Sentiment Analysis. MITR-DNet employs a
-> distillation approach, wherein a complete modality teacher model guides a
-> missing modality student model, ensuring robustness in the presence of modality
-> missing. Simultaneously, we developed the Modality-Invariant Bidirectional
-> Temporal Representation Learning Module (MIB-TRL) to mitigate heterogeneity.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on Multimodal Sentiment Analysis with a network architecture (MITR-DNet) to address modality missing and heterogeneity issues, rather than prompt engineering for Large Language Models (LLMs), failing to meet the 'MUST' criteria for focus on LLM prompt engineering and manipulation of textual input prompts.
-
----
-
-## [Cooperative Search and Track of Rogue Drones using Multiagent
-  Reinforcement Learning](https://arxiv.org/abs/http://arxiv.org/abs/2501.10413v1)
-**arXiv ID:** http://arxiv.org/abs/2501.10413v1
-
-**Abstract:**
-> This work considers the problem of intercepting rogue drones targeting
-> sensitive critical infrastructure facilities. While current interception
-> technologies focus mainly on the jamming/spoofing tasks, the challenges of
-> effectively locating and tracking rogue drones have not received adequate
-> attention. Solving this problem and integrating with recently proposed
-> interception techniques will enable a holistic system that can reliably detect,
-> track, and neutralize rogue drones. Specifically, this work considers a team of
-> pursuer UAVs that can search, detect, and track multiple rogue drones over a
-> sensitive facility. The joint search and track problem is addressed through a
-> novel multiagent reinforcement learning scheme to optimize the agent mobility
-> control actions that maximize the number of rogue drones detected and tracked.
-> The performance of the proposed system is investigated under realistic settings
-> through extensive simulation experiments with varying number of agents
-> demonstrating both its performance and scalability.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on multiagent reinforcement learning for tracking rogue drones, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Efficient Deployment of Large Language Models on Resource-constrained
-  Devices](https://arxiv.org/abs/http://arxiv.org/abs/2501.02438v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02438v1
-
-**Abstract:**
-> Deploying Large Language Models (LLMs) on resource-constrained (or weak)
-> devices presents significant challenges due to limited resources and
-> heterogeneous data distribution. To address the data concern, it is necessary
-> to fine-tune LLMs using on-device private data for various downstream tasks.
-> While Federated Learning (FL) offers a promising privacy-preserving solution,
-> existing fine-tuning methods retain the original LLM size, leaving issues of
-> high inference latency and excessive memory demands unresolved. Hence, we
-> design FedSpine, an FL framework that combines Parameter- Efficient Fine-Tuning
-> (PEFT) with structured pruning for efficient deployment of LLMs on
-> resource-constrained devices. Specifically, FedSpine introduces an iterative
-> process to prune and tune the parameters of LLMs. To mitigate the impact of
-> device heterogeneity, an online Multi-Armed Bandit (MAB) algorithm is employed
-> to adaptively determine different pruning ratios and LoRA ranks for
-> heterogeneous devices without any prior knowledge of their computing and
-> communication capabilities. As a result, FedSpine maintains higher inference
-> accuracy while improving fine-tuning efficiency. Experimental results conducted
-> on a physical platform with 80 devices demonstrate that FedSpine can speed up
-> fine-tuning by 1.4$\times$-6.9$\times$ and improve final accuracy by 0.4%-4.5%
-> under the same sparsity level compared to other baselines.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the efficient deployment and fine-tuning of Large Language Models on resource-constrained devices, rather than on the engineering, design, or optimization of prompts for LLMs, failing to meet the first 'MUST' criterion.
-
----
-
-## [FedRSClip: Federated Learning for Remote Sensing Scene Classification
-  Using Vision-Language Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02461v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02461v1
-
-**Abstract:**
-> Remote sensing data is often distributed across multiple institutions, and
-> due to privacy concerns and data-sharing restrictions, leveraging large-scale
-> datasets in a centralized training framework is challenging. Federated learning
-> offers a promising solution by enabling collaborative model training across
-> distributed data sources without requiring data centralization. However,
-> current Vision-Language Models (VLMs), which typically contain billions of
-> parameters, pose significant communication challenges for traditional federated
-> learning approaches based on model parameter updates, as they would incur
-> substantial communication costs. In this paper, we propose FedRSCLIP, the first
-> federated learning framework designed for remote sensing image classification
-> based on a VLM, specifically CLIP. FedRSCLIP addresses the challenges of data
-> heterogeneity and large-scale model transmission in federated environments by
-> introducing Prompt Learning, which optimizes only a small set of tunable
-> parameters. The framework introduces a dual-prompt mechanism, comprising Shared
-> Prompts for global knowledge sharing and Private Prompts for client-specific
-> adaptation. To maintain semantic coherence between shared and private prompts,
-> we propose the Dual Prompt Alignment Constraint to balance global consistency
-> and local adaptability across diverse client distributions. Additionally, to
-> enhance cross-modal representation learning, we introduce the Cross-Modal
-> Feature Alignment Constraint to align multimodal features between text and
-> image prompts. To validate the effectiveness of our proposed model, we
-> construct a Fed-RSIC dataset based on three existing remote sensing image
-> classification datasets, specifically designed to simulate various federated
-> learning configurations. Experimental results demonstrate the effectiveness and
-> superiority of FedRSCLIP in remote sensing image classification.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on federated learning for Vision-Language Models (VLMs) in remote sensing image classification, not on the engineering, design, or optimization of textual input prompts for Large Language Models (LLMs), and does not provide concrete examples of text prompts impacting LLM output.
-
----
-
-## [Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera](https://arxiv.org/abs/http://arxiv.org/abs/2501.02464v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02464v1
-
-**Abstract:**
-> While recent depth estimation methods exhibit strong zero-shot
-> generalization, achieving accurate metric depth across diverse camera
-> types-particularly those with large fields of view (FoV) such as fisheye and
-> 360-degree cameras-remains a significant challenge. This paper presents Depth
-> Any Camera (DAC), a powerful zero-shot metric depth estimation framework that
-> extends a perspective-trained model to effectively handle cameras with varying
-> FoVs. The framework is designed to ensure that all existing 3D data can be
-> leveraged, regardless of the specific camera types used in new applications.
-> Remarkably, DAC is trained exclusively on perspective images but generalizes
-> seamlessly to fisheye and 360-degree cameras without the need for specialized
-> training data. DAC employs Equi-Rectangular Projection (ERP) as a unified image
-> representation, enabling consistent processing of images with diverse FoVs. Its
-> key components include a pitch-aware Image-to-ERP conversion for efficient
-> online augmentation in ERP space, a FoV alignment operation to support
-> effective training across a wide range of FoVs, and multi-resolution data
-> augmentation to address resolution disparities between training and testing.
-> DAC achieves state-of-the-art zero-shot metric depth estimation, improving
-> delta-1 ($\delta_1$) accuracy by up to 50% on multiple fisheye and 360-degree
-> datasets compared to prior metric depth foundation models, demonstrating robust
-> generalization across camera types.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on depth estimation in computer vision, specifically developing a framework for metric depth estimation from various camera types, and does not address prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for
-  Inference Cost Optimization](https://arxiv.org/abs/http://arxiv.org/abs/2501.02508v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02508v1
-
-**Abstract:**
-> For many practical applications, a high computational cost of inference over
-> deep network architectures might be unacceptable. A small degradation in the
-> overall inference accuracy might be a reasonable price to pay for a significant
-> reduction in the required computational resources. In this work, we describe a
-> method for introducing "shortcuts" into the DNN feedforward inference process
-> by skipping costly feedforward computations whenever possible. The proposed
-> method is based on the previously described BranchyNet (Teerapittayanon et al.,
-> 2016) and the EEnet (Demir, 2019) architectures that jointly train the main
-> network and early exit branches. We extend those methods by attaching branches
-> to pre-trained models and, thus, eliminating the need to alter the original
-> weights of the network. We also suggest a new branch architecture based on
-> convolutional building blocks to allow enough training capacity when applied on
-> large DNNs. The proposed architecture includes confidence heads that are used
-> for predicting the confidence level in the corresponding early exits. By
-> defining adjusted thresholds on these confidence extensions, we can control in
-> real-time the amount of data exiting from each branch and the overall tradeoff
-> between speed and accuracy of our model. In our experiments, we evaluate our
-> method using image datasets (SVHN and CIFAR10) and several DNN architectures
-> (ResNet, DenseNet, VGG) with varied depth. Our results demonstrate that the
-> proposed method enables us to reduce the average inference computational cost
-> and further controlling the tradeoff between the model accuracy and the
-> computation cost.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on optimizing the inference cost of deep neural networks (DNNs) through early-exit strategies, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thus failing to meet the primary criteria.
-
----
-
-## [Evaluating Large Language Models Against Human Annotators in Latent
-  Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and
-  Sarcasm](https://arxiv.org/abs/http://arxiv.org/abs/2501.02532v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02532v1
-
-**Abstract:**
-> In the era of rapid digital communication, vast amounts of textual data are
-> generated daily, demanding efficient methods for latent content analysis to
-> extract meaningful insights. Large Language Models (LLMs) offer potential for
-> automating this process, yet comprehensive assessments comparing their
-> performance to human annotators across multiple dimensions are lacking. This
-> study evaluates the reliability, consistency, and quality of seven
-> state-of-the-art LLMs, including variants of OpenAI's GPT-4, Gemini, Llama, and
-> Mixtral, relative to human annotators in analyzing sentiment, political
-> leaning, emotional intensity, and sarcasm detection. A total of 33 human
-> annotators and eight LLM variants assessed 100 curated textual items,
-> generating 3,300 human and 19,200 LLM annotations, with LLMs evaluated across
-> three time points to examine temporal consistency. Inter-rater reliability was
-> measured using Krippendorff's alpha, and intra-class correlation coefficients
-> assessed consistency over time. The results reveal that both humans and LLMs
-> exhibit high reliability in sentiment analysis and political leaning
-> assessments, with LLMs demonstrating higher internal consistency than humans.
-> In emotional intensity, LLMs displayed higher agreement compared to humans,
-> though humans rated emotional intensity significantly higher. Both groups
-> struggled with sarcasm detection, evidenced by low agreement. LLMs showed
-> excellent temporal consistency across all dimensions, indicating stable
-> performance over time. This research concludes that LLMs, especially GPT-4, can
-> effectively replicate human analysis in sentiment and political leaning,
-> although human expertise remains essential for emotional intensity
-> interpretation. The findings demonstrate the potential of LLMs for consistent
-> and high-quality performance in certain areas of latent content analysis.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on evaluating LLMs' performance in latent content analysis compared to human annotators, rather than primarily investigating, analyzing, or proposing methods for improving LLM performance through the manipulation of textual input prompts.
-
----
-
-## [Energy Optimization of Multi-task DNN Inference in MEC-assisted XR
-  Devices: A Lyapunov-Guided Reinforcement Learning Approach](https://arxiv.org/abs/http://arxiv.org/abs/2501.02572v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02572v1
-
-**Abstract:**
-> Extended reality (XR), blending virtual and real worlds, is a key application
-> of future networks. While AI advancements enhance XR capabilities, they also
-> impose significant computational and energy challenges on lightweight XR
-> devices. In this paper, we developed a distributed queue model for multi-task
-> DNN inference, addressing issues of resource competition and queue coupling. In
-> response to the challenges posed by the high energy consumption and limited
-> resources of XR devices, we designed a dual time-scale joint optimization
-> strategy for model partitioning and resource allocation, formulated as a
-> bi-level optimization problem. This strategy aims to minimize the total energy
-> consumption of XR devices while ensuring queue stability and adhering to
-> computational and communication resource constraints. To tackle this problem,
-> we devised a Lyapunov-guided Proximal Policy Optimization algorithm, named
-> LyaPPO. Numerical results demonstrate that the LyaPPO algorithm outperforms the
-> baselines, achieving energy conservation of 24.79% to 46.14% under varying
-> resource capacities. Specifically, the proposed algorithm reduces the energy
-> consumption of XR devices by 24.29% to 56.62% compared to baseline algorithms.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on energy optimization of Multi-task DNN Inference in MEC-assisted XR Devices using Reinforcement Learning, with no apparent focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), thus failing to meet the 'MUST' criteria.
-
----
-
-## [TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud
-  Platforms](https://arxiv.org/abs/http://arxiv.org/abs/2501.02600v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02600v1
-
-**Abstract:**
-> The rising demand for generative large language models (LLMs) poses
-> challenges for thermal and power management in cloud datacenters. Traditional
-> techniques often are inadequate for LLM inference due to the fine-grained,
-> millisecond-scale execution phases, each with distinct performance, thermal,
-> and power profiles. Additionally, LLM inference workloads are sensitive to
-> various configuration parameters (e.g., model parallelism, size, and
-> quantization) that involve trade-offs between performance, temperature, power,
-> and output quality. Moreover, clouds often co-locate SaaS and IaaS workloads,
-> each with different levels of visibility and flexibility. We propose TAPAS, a
-> thermal- and power-aware framework designed for LLM inference clusters in the
-> cloud. TAPAS enhances cooling and power oversubscription capabilities, reducing
-> the total cost of ownership (TCO) while effectively handling emergencies (e.g.,
-> cooling and power failures). The system leverages historical temperature and
-> power data, along with the adaptability of SaaS workloads, to: (1) efficiently
-> place new GPU workload VMs within cooling and power constraints, (2) route LLM
-> inference requests across SaaS VMs, and (3) reconfigure SaaS VMs to manage load
-> spikes and emergency situations. Our evaluation on a large GPU cluster
-> demonstrates significant reductions in thermal and power throttling events,
-> boosting system efficiency.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on thermal- and power-aware scheduling for LLM inference in cloud platforms, addressing infrastructure management rather than prompt engineering for text-based interactions with LLMs, thus failing to meet the core subject requirement.
-
----
-
-## [Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for
-  Jailbreak Attack Defense](https://arxiv.org/abs/http://arxiv.org/abs/2501.02629v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02629v1
-
-**Abstract:**
-> As large language models (LLMs) are increasingly deployed in diverse
-> applications, including chatbot assistants and code generation, aligning their
-> behavior with safety and ethical standards has become paramount. However,
-> jailbreak attacks, which exploit vulnerabilities to elicit unintended or
-> harmful outputs, threaten LLMs' safety significantly. In this paper, we
-> introduce Layer-AdvPatcher, a novel methodology designed to defend against
-> jailbreak attacks by utilizing an unlearning strategy to patch specific layers
-> within LLMs through self-augmented datasets. Our insight is that certain
-> layer(s), tend to produce affirmative tokens when faced with harmful prompts.
-> By identifying these layers and adversarially exposing them to generate more
-> harmful data, one can understand their inherent and diverse vulnerabilities to
-> attacks. With these exposures, we then "unlearn" these issues, reducing the
-> impact of affirmative tokens and hence minimizing jailbreak risks while keeping
-> the model's responses to safe queries intact. We conduct extensive experiments
-> on two models, four benchmark datasets, and multiple state-of-the-art jailbreak
-> benchmarks to demonstrate the efficacy of our approach. Results indicate that
-> our framework reduces the harmfulness and attack success rate of jailbreak
-> attacks without compromising utility for benign queries compared to recent
-> defense methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on defending against jailbreak attacks by modifying LLM architecture (Layer-AdvPatcher) and utilizing an unlearning strategy, rather than primarily focusing on the engineering, design, or optimization of prompts for improving LLM performance through textual input manipulation.
-
----
-
-## [Representation Learning of Lab Values via Masked AutoEncoder](https://arxiv.org/abs/http://arxiv.org/abs/2501.02648v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02648v2
-
-**Abstract:**
-> Accurate imputation of missing laboratory values in electronic health records
-> (EHRs) is critical to enable robust clinical predictions and reduce biases in
-> AI systems in healthcare. Existing methods, such as variational autoencoders
-> (VAEs) and decision tree-based approaches such as XGBoost, struggle to model
-> the complex temporal and contextual dependencies in EHR data, mainly in
-> underrepresented groups. In this work, we propose Lab-MAE, a novel
-> transformer-based masked autoencoder framework that leverages self-supervised
-> learning for the imputation of continuous sequential lab values. Lab-MAE
-> introduces a structured encoding scheme that jointly models laboratory test
-> values and their corresponding timestamps, enabling explicit capturing temporal
-> dependencies. Empirical evaluation on the MIMIC-IV dataset demonstrates that
-> Lab-MAE significantly outperforms the state-of-the-art baselines such as
-> XGBoost across multiple metrics, including root mean square error (RMSE),
-> R-squared (R2), and Wasserstein distance (WD). Notably, Lab-MAE achieves
-> equitable performance across demographic groups of patients, advancing fairness
-> in clinical predictions. We further investigate the role of follow-up
-> laboratory values as potential shortcut features, revealing Lab-MAE's
-> robustness in scenarios where such data is unavailable. The findings suggest
-> that our transformer-based architecture, adapted to the characteristics of the
-> EHR data, offers a foundation model for more accurate and fair clinical
-> imputation models. In addition, we measure and compare the carbon footprint of
-> Lab-MAE with the baseline XGBoost model, highlighting its environmental
-> requirements.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a novel transformer-based masked autoencoder for imputing missing laboratory values in electronic health records, which does not meet the 'MUST' criteria of primarily focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts for improving LLM performance.
-
----
-
-## [From thermodynamics to protein design: Diffusion models for biomolecule
-  generation towards autonomous protein engineering](https://arxiv.org/abs/http://arxiv.org/abs/2501.02680v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02680v1
-
-**Abstract:**
-> Protein design with desirable properties has been a significant challenge for
-> many decades. Generative artificial intelligence is a promising approach and
-> has achieved great success in various protein generation tasks. Notably,
-> diffusion models stand out for their robust mathematical foundations and
-> impressive generative capabilities, offering unique advantages in certain
-> applications such as protein design. In this review, we first give the
-> definition and characteristics of diffusion models and then focus on two
-> strategies: Denoising Diffusion Probabilistic Models and Score-based Generative
-> Models, where DDPM is the discrete form of SGM. Furthermore, we discuss their
-> applications in protein design, peptide generation, drug discovery, and
-> protein-ligand interaction. Finally, we outline the future perspectives of
-> diffusion models to advance autonomous protein design and engineering. The E(3)
-> group consists of all rotations, reflections, and translations in
-> three-dimensions. The equivariance on the E(3) group can keep the physical
-> stability of the frame of each amino acid as much as possible, and we reflect
-> on how to keep the diffusion model E(3) equivariant for protein generation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on the application of diffusion models for biomolecule generation in protein design, not on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs). It also concerns a medical/biological subject (protein design) and generative AI application (protein generation) outside the specified text generation driven by LLMs.
-
----
-
-## [EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in
-  Instructional Multimodal Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02699v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02699v1
-
-**Abstract:**
-> Large language models and vision transformers have demonstrated impressive
-> zero-shot capabilities, enabling significant transferability in downstream
-> tasks. The fusion of these models has resulted in multi-modal architectures
-> with enhanced instructional capabilities. Despite incorporating vast image and
-> language pre-training, these multi-modal architectures often generate responses
-> that deviate from the ground truth in the image data. These failure cases are
-> known as hallucinations. Current methods for mitigating hallucinations
-> generally focus on regularizing the language component, improving the fusion
-> module, or ensembling multiple visual encoders to improve visual
-> representation. In this paper, we address the hallucination issue by directly
-> enhancing the capabilities of the visual component. Our approach, named EAGLE,
-> is fully agnostic to the LLM or fusion module and works as a post-pretraining
-> approach that improves the grounding and language alignment of the visual
-> encoder. We show that a straightforward reformulation of the original
-> contrastive pre-training task results in an improved visual encoder that can be
-> incorporated into the instructional multi-modal architecture without additional
-> instructional training. As a result, EAGLE achieves a significant reduction in
-> hallucinations across multiple challenging benchmarks and tasks.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on enhancing visual grounding in multimodal models to reduce hallucinations, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs). It meets none of the 'MUST' criteria for prompt engineering, focusing instead on improving the visual component of multimodal architectures.
-
----
-
-## [QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted
-  Question Matching for Enhanced QA Performance](https://arxiv.org/abs/http://arxiv.org/abs/2501.02702v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02702v1
-
-**Abstract:**
-> This work presents a novel architecture for building Retrieval-Augmented
-> Generation (RAG) systems to improve Question Answering (QA) tasks from a target
-> corpus. Large Language Models (LLMs) have revolutionized the analyzing and
-> generation of human-like text. These models rely on pre-trained data and lack
-> real-time updates unless integrated with live data tools. RAG enhances LLMs by
-> integrating online resources and databases to generate contextually appropriate
-> responses. However, traditional RAG still encounters challenges like
-> information dilution and hallucinations when handling vast amounts of data. Our
-> approach addresses these challenges by converting corpora into a
-> domain-specific dataset and RAG architecture is constructed to generate
-> responses from the target document. We introduce QuIM-RAG (Question-to-question
-> Inverted Index Matching), a novel approach for the retrieval mechanism in our
-> system. This strategy generates potential questions from document chunks and
-> matches these with user queries to identify the most relevant text chunks for
-> generating accurate answers. We have implemented our RAG system on top of the
-> open-source Meta-LLaMA3-8B-instruct model by Meta Inc. that is available on
-> Hugging Face. We constructed a custom corpus of 500+ pages from a high-traffic
-> website accessed thousands of times daily for answering complex questions,
-> along with manually prepared ground truth QA for evaluation. We compared our
-> approach with traditional RAG models using BERT-Score and RAGAS,
-> state-of-the-art metrics for evaluating LLM applications. Our evaluation
-> demonstrates that our approach outperforms traditional RAG architectures on
-> both metrics.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on advancing Retrieval-Augmented Generation (RAG) system architecture for Question Answering tasks, rather than specifically on the engineering, design, or optimization of prompts for Large Language Models (LLMs), as required. While LLMs are utilized, the core subject is the RAG system's enhancement, not prompt engineering for text-based LLM interactions.
-
----
-
-## [OpenGU: A Comprehensive Benchmark for Graph Unlearning](https://arxiv.org/abs/http://arxiv.org/abs/2501.02728v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02728v1
-
-**Abstract:**
-> Graph Machine Learning is essential for understanding and analyzing
-> relational data. However, privacy-sensitive applications demand the ability to
-> efficiently remove sensitive information from trained graph neural networks
-> (GNNs), avoiding the unnecessary time and space overhead caused by retraining
-> models from scratch. To address this issue, Graph Unlearning (GU) has emerged
-> as a critical solution, with the potential to support dynamic graph updates in
-> data management systems and enable scalable unlearning in distributed data
-> systems while ensuring privacy compliance. Unlike machine unlearning in
-> computer vision or other fields, GU faces unique difficulties due to the
-> non-Euclidean nature of graph data and the recursive message-passing mechanism
-> of GNNs. Additionally, the diversity of downstream tasks and the complexity of
-> unlearning requests further amplify these challenges. Despite the proliferation
-> of diverse GU strategies, the absence of a benchmark providing fair comparisons
-> for GU, and the limited flexibility in combining downstream tasks and
-> unlearning requests, have yielded inconsistencies in evaluations, hindering the
-> development of this domain. To fill this gap, we present OpenGU, the first GU
-> benchmark, where 16 SOTA GU algorithms and 37 multi-domain datasets are
-> integrated, enabling various downstream tasks with 13 GNN backbones when
-> responding to flexible unlearning requests. Based on this unified benchmark
-> framework, we are able to provide a comprehensive and fair evaluation for GU.
-> Through extensive experimentation, we have drawn $8$ crucial conclusions about
-> existing GU methods, while also gaining valuable insights into their
-> limitations, shedding light on potential avenues for future research.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance. Instead, it focuses on Graph Unlearning (GU) for Graph Neural Networks (GNNs), which falls outside the specified criteria.
-
----
-
-## [GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic
-  Features for Medical Image Segmentation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02788v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02788v2
-
-**Abstract:**
-> Vision Transformers (ViTs) have shown promise in medical image semantic
-> segmentation (MISS) by capturing long-range correlations. However, ViTs often
-> struggle to model local spatial information effectively, which is essential for
-> accurately segmenting fine anatomical details, particularly when applied to
-> small datasets without extensive pre-training. We introduce Gabor and Laplacian
-> of Gaussian Convolutional Swin Network (GLoG-CSUnet), a novel architecture
-> enhancing Transformer-based models by incorporating learnable radiomic
-> features. This approach integrates dynamically adaptive Gabor and Laplacian of
-> Gaussian (LoG) filters to capture texture, edge, and boundary information,
-> enhancing the feature representation processed by the Transformer model. Our
-> method uniquely combines the long-range dependency modeling of Transformers
-> with the texture analysis capabilities of Gabor and LoG features. Evaluated on
-> the Synapse multi-organ and ACDC cardiac segmentation datasets, GLoG-CSUnet
-> demonstrates significant improvements over state-of-the-art models, achieving a
-> 1.14% increase in Dice score for Synapse and 0.99% for ACDC, with minimal
-> computational overhead (only 15 and 30 additional parameters, respectively).
-> GLoG-CSUnet's flexible design allows integration with various base models,
-> offering a promising approach for incorporating radiomics-inspired feature
-> extraction in Transformer architectures for medical image analysis. The code
-> implementation is available on GitHub at: https://github.com/HAAIL/GLoG-CSUnet.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing Vision Transformers with adaptable radiomic features for Medical Image Segmentation, which falls under the excluded categories of 'medical subjects' and 'applications of generative AI other than text generation driven by LLMs', and does not meet the core subject requirement of prompt engineering for text-based interactions with LLMs.
-
----
-
-## [RDD4D: 4D Attention-Guided Road Damage Detection And Classification](https://arxiv.org/abs/http://arxiv.org/abs/2501.02822v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02822v1
-
-**Abstract:**
-> Road damage detection and assessment are crucial components of infrastructure
-> maintenance. However, current methods often struggle with detecting multiple
-> types of road damage in a single image, particularly at varying scales. This is
-> due to the lack of road datasets with various damage types having varying
-> scales. To overcome this deficiency, first, we present a novel dataset called
-> Diverse Road Damage Dataset (DRDD) for road damage detection that captures the
-> diverse road damage types in individual images, addressing a crucial gap in
-> existing datasets. Then, we provide our model, RDD4D, that exploits Attention4D
-> blocks, enabling better feature refinement across multiple scales. The
-> Attention4D module processes feature maps through an attention mechanism
-> combining positional encoding and "Talking Head" components to capture local
-> and global contextual information. In our comprehensive experimental analysis
-> comparing various state-of-the-art models on our proposed, our enhanced model
-> demonstrated superior performance in detecting large-sized road cracks with an
-> Average Precision (AP) of 0.458 and maintained competitive performance with an
-> overall AP of 0.445. Moreover, we also provide results on the CrackTinyNet
-> dataset; our model achieved around a 0.21 increase in performance. The code,
-> model weights, dataset, and our results are available on
-> \href{https://github.com/msaqib17/Road_Damage_Detection}{https://github.com/msaqib17/Road\_Damage\_Detection}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on road damage detection using computer vision and deep learning techniques, with no mention of Large Language Models (LLMs), prompt engineering, or textual input prompts, thereby failing to meet all the 'MUST' criteria.
-
----
-
-## [Forward Once for All: Structural Parameterized Adaptation for Efficient
-  Cloud-coordinated On-device Recommendation](https://arxiv.org/abs/http://arxiv.org/abs/2501.02837v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02837v1
-
-**Abstract:**
-> In cloud-centric recommender system, regular data exchanges between user
-> devices and cloud could potentially elevate bandwidth demands and privacy
-> risks. On-device recommendation emerges as a viable solution by performing
-> reranking locally to alleviate these concerns. Existing methods primarily focus
-> on developing local adaptive parameters, while potentially neglecting the
-> critical role of tailor-made model architecture. Insights from broader research
-> domains suggest that varying data distributions might favor distinct
-> architectures for better fitting. In addition, imposing a uniform model
-> structure across heterogeneous devices may result in risking inefficacy on less
-> capable devices or sub-optimal performance on those with sufficient
-> capabilities. In response to these gaps, our paper introduces Forward-OFA, a
-> novel approach for the dynamic construction of device-specific networks (both
-> structure and parameters). Forward-OFA employs a structure controller to
-> selectively determine whether each block needs to be assembled for a given
-> device. However, during the training of the structure controller, these
-> assembled heterogeneous structures are jointly optimized, where the co-adaption
-> among blocks might encounter gradient conflicts. To mitigate this, Forward-OFA
-> is designed to establish a structure-guided mapping of real-time behaviors to
-> the parameters of assembled networks. Structure-related parameters and parallel
-> components within the mapper prevent each part from receiving heterogeneous
-> gradients from others, thus bypassing the gradient conflicts for coupled
-> optimization. Besides, direct mapping enables Forward-OFA to achieve adaptation
-> through only one forward pass, allowing for swift adaptation to changing
-> interests and eliminating the requirement for on-device backpropagation.
-> Experiments on real-world datasets demonstrate the effectiveness and efficiency
-> of Forward-OFA.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a novel approach for dynamic construction of device-specific networks for on-device recommendation, with no clear emphasis on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Explaining Humour Style Classifications: An XAI Approach to
-  Understanding Computational Humour Analysis](https://arxiv.org/abs/http://arxiv.org/abs/2501.02891v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02891v1
-
-**Abstract:**
-> Humour styles can have either a negative or a positive impact on well-being.
-> Given the importance of these styles to mental health, significant research has
-> been conducted on their automatic identification. However, the automated
-> machine learning models used for this purpose are black boxes, making their
-> prediction decisions opaque. Clarity and transparency are vital in the field of
-> mental health. This paper presents an explainable AI (XAI) framework for
-> understanding humour style classification, building upon previous work in
-> computational humour analysis. Using the best-performing single model
-> (ALI+XGBoost) from prior research, we apply comprehensive XAI techniques to
-> analyse how linguistic, emotional, and semantic features contribute to humour
-> style classification decisions. Our analysis reveals distinct patterns in how
-> different humour styles are characterised and misclassified, with particular
-> emphasis on the challenges in distinguishing affiliative humour from other
-> styles. Through detailed examination of feature importance, error patterns, and
-> misclassification cases, we identify key factors influencing model decisions,
-> including emotional ambiguity, context misinterpretation, and target
-> identification. The framework demonstrates significant utility in understanding
-> model behaviour, achieving interpretable insights into the complex interplay of
-> features that define different humour styles. Our findings contribute to both
-> the theoretical understanding of computational humour analysis and practical
-> applications in mental health, content moderation, and digital humanities
-> research.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on explainable AI (XAI) for humour style classification in computational humour analysis, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), failing to meet the first 'MUST' criterion.
-
----
-
-## [Label-free Concept Based Multiple Instance Learning for Gigapixel
-  Histopathology](https://arxiv.org/abs/http://arxiv.org/abs/2501.02922v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02922v1
-
-**Abstract:**
-> Multiple Instance Learning (MIL) methods allow for gigapixel Whole-Slide
-> Image (WSI) analysis with only slide-level annotations. Interpretability is
-> crucial for safely deploying such algorithms in high-stakes medical domains.
-> Traditional MIL methods offer explanations by highlighting salient regions.
-> However, such spatial heatmaps provide limited insights for end users. To
-> address this, we propose a novel inherently interpretable WSI-classification
-> approach that uses human-understandable pathology concepts to generate
-> explanations. Our proposed Concept MIL model leverages recent advances in
-> vision-language models to directly predict pathology concepts based on image
-> features. The model's predictions are obtained through a linear combination of
-> the concepts identified on the top-K patches of a WSI, enabling inherent
-> explanations by tracing each concept's influence on the prediction. In contrast
-> to traditional concept-based interpretable models, our approach eliminates the
-> need for costly human annotations by leveraging the vision-language model. We
-> validate our method on two widely used pathology datasets: Camelyon16 and
-> PANDA. On both datasets, Concept MIL achieves AUC and accuracy scores over 0.9,
-> putting it on par with state-of-the-art models. We further find that 87.1\%
-> (Camelyon16) and 85.3\% (PANDA) of the top 20 patches fall within the tumor
-> region. A user study shows that the concepts identified by our model align with
-> the concepts used by pathologists, making it a promising strategy for
-> human-interpretable WSI classification.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on medical image analysis (histopathology) using vision-language models, not on engineering, design, or optimization of textual input prompts for Large Language Models (LLMs), and does not meet the core subject requirement of prompt engineering for text-based interactions with LLMs.
-
----
-
-## [Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the
-  Wild](https://arxiv.org/abs/http://arxiv.org/abs/2501.02964v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02964v2
-
-**Abstract:**
-> Complex visual reasoning remains a key challenge today. Typically, the
-> challenge is tackled using methodologies such as Chain of Thought (COT) and
-> visual instruction tuning. However, how to organically combine these two
-> methodologies for greater success remains unexplored. Also, issues like
-> hallucinations and high training cost still need to be addressed. In this work,
-> we devise an innovative multi-round training and reasoning framework suitable
-> for lightweight Multimodal Large Language Models (MLLMs). Our self-questioning
-> approach heuristically guides MLLMs to focus on visual clues relevant to the
-> target problem, reducing hallucinations and enhancing the model's ability to
-> describe fine-grained image details. This ultimately enables the model to
-> perform well in complex visual reasoning and question-answering tasks. We have
-> named this framework Socratic Questioning(SQ). To facilitate future research,
-> we create a multimodal mini-dataset named CapQA, which includes 1k images of
-> fine-grained activities, for visual instruction tuning and evaluation, our
-> proposed SQ method leads to a 31.2% improvement in the hallucination score. Our
-> extensive experiments on various benchmarks demonstrate SQ's remarkable
-> capabilities in heuristic self-questioning, zero-shot visual reasoning and
-> hallucination mitigation. Our model and code will be publicly available.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multimodal reasoning, visual question-answering, and reducing hallucinations in Multimodal Large Language Models (MLLMs) through a self-questioning framework, rather than primarily on the engineering, design, or optimization of textual input prompts for Large Language Models (LLMs) as required.
-
----
-
-## [Proof-of-Data: A Consensus Protocol for Collaborative Intelligence](https://arxiv.org/abs/http://arxiv.org/abs/2501.02971v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02971v1
-
-**Abstract:**
-> Existing research on federated learning has been focused on the setting where
-> learning is coordinated by a centralized entity. Yet the greatest potential of
-> future collaborative intelligence would be unleashed in a more open and
-> democratized setting with no central entity in a dominant role, referred to as
-> "decentralized federated learning". New challenges arise accordingly in
-> achieving both correct model training and fair reward allocation with
-> collective effort among all participating nodes, especially with the threat of
-> the Byzantine node jeopardising both tasks.
-> In this paper, we propose a blockchain-based decentralized Byzantine
-> fault-tolerant federated learning framework based on a novel Proof-of-Data
-> (PoD) consensus protocol to resolve both the "trust" and "incentive"
-> components. By decoupling model training and contribution accounting, PoD is
-> able to enjoy not only the benefit of learning efficiency and system liveliness
-> from asynchronous societal-scale PoW-style learning but also the finality of
-> consensus and reward allocation from epoch-based BFT-style voting. To mitigate
-> false reward claims by data forgery from Byzantine attacks, a privacy-aware
-> data verification and contribution-based reward allocation mechanism is
-> designed to complete the framework. Our evaluation results show that PoD
-> demonstrates performance in model training close to that of the centralized
-> counterpart while achieving trust in consensus and fairness for reward
-> allocation with a fault tolerance ratio of 1/3.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on decentralized federated learning, blockchain, and a consensus protocol, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular
-  Balls](https://arxiv.org/abs/http://arxiv.org/abs/2501.02975v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02975v1
-
-**Abstract:**
-> Outlier detection refers to the identification of anomalous samples that
-> deviate significantly from the distribution of normal data and has been
-> extensively studied and used in a variety of practical tasks. However, most
-> unsupervised outlier detection methods are carefully designed to detect
-> specified outliers, while real-world data may be entangled with different types
-> of outliers. In this study, we propose a fuzzy rough sets-based multi-scale
-> outlier detection method to identify various types of outliers. Specifically, a
-> novel fuzzy rough sets-based method that integrates relative fuzzy granule
-> density is first introduced to improve the capability of detecting local
-> outliers. Then, a multi-scale view generation method based on granular-ball
-> computing is proposed to collaboratively identify group outliers at different
-> levels of granularity. Moreover, reliable outliers and inliers determined by
-> the three-way decision are used to train a weighted support vector machine to
-> further improve the performance of outlier detection. The proposed method
-> innovatively transforms unsupervised outlier detection into a semi-supervised
-> classification problem and for the first time explores the fuzzy rough
-> sets-based outlier detection from the perspective of multi-scale granular
-> balls, allowing for high adaptability to different types of outliers. Extensive
-> experiments carried out on both artificial and UCI datasets demonstrate that
-> the proposed outlier detection method significantly outperforms the
-> state-of-the-art methods, improving the results by at least 8.48% in terms of
-> the Area Under the ROC Curve (AUROC) index. { The source codes are released at
-> \url{https://github.com/Xiaofeng-Tan/MGBOD}. }
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary criteria as it focuses on outlier detection using fuzzy rough sets and multi-scale granular balls, with no mention of Large Language Models (LLMs), prompt engineering, or text generation, thereby failing to align with the specified requirements.
-
----
-
-## [CAMP: Collaborative Attention Model with Profiles for Vehicle Routing
-  Problems](https://arxiv.org/abs/http://arxiv.org/abs/2501.02977v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02977v1
-
-**Abstract:**
-> The profiled vehicle routing problem (PVRP) is a generalization of the
-> heterogeneous capacitated vehicle routing problem (HCVRP) in which the
-> objective is to optimize the routes of vehicles to serve client demands subject
-> to different vehicle profiles, with each having a preference or constraint on a
-> per-client basis. While existing learning methods have shown promise for
-> solving the HCVRP in real-time, no learning method exists to solve the more
-> practical and challenging PVRP. In this paper, we propose a Collaborative
-> Attention Model with Profiles (CAMP), a novel approach that learns efficient
-> solvers for PVRP using multi-agent reinforcement learning. CAMP employs a
-> specialized attention-based encoder architecture to embed profiled client
-> embeddings in parallel for each vehicle profile. We design a communication
-> layer between agents for collaborative decision-making across profiled
-> embeddings at each decoding step and a batched pointer mechanism to attend to
-> the profiled embeddings to evaluate the likelihood of the next actions. We
-> evaluate CAMP on two variants of PVRPs: PVRP with preferences, which explicitly
-> influence the reward function, and PVRP with zone constraints with different
-> numbers of agents and clients, demonstrating that our learned solvers achieve
-> competitive results compared to both classical state-of-the-art neural
-> multi-agent models in terms of solution quality and computational efficiency.
-> We make our code openly available at https://github.com/ai4co/camp.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on developing a multi-agent reinforcement learning model (CAMP) for solving vehicle routing problems, with no emphasis on prompt engineering, Large Language Models (LLMs), or the manipulation of textual input prompts for LLM performance.
-
----
-
-## [CONTINUUM: Detecting APT Attacks through Spatial-Temporal Graph Neural
-  Networks](https://arxiv.org/abs/http://arxiv.org/abs/2501.02981v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02981v2
-
-**Abstract:**
-> Advanced Persistent Threats (APTs) represent a significant challenge in
-> cybersecurity due to their sophisticated and stealthy nature. Traditional
-> Intrusion Detection Systems (IDS) often fall short in detecting these
-> multi-stage attacks. Recently, Graph Neural Networks (GNNs) have been employed
-> to enhance IDS capabilities by analyzing the complex relationships within
-> networked data. However, existing GNN-based solutions are hampered by high
-> false positive rates and substantial resource consumption. In this paper, we
-> present a novel IDS designed to detect APTs using a Spatio-Temporal Graph
-> Neural Network Autoencoder. Our approach leverages spatial information to
-> understand the interactions between entities within a graph and temporal
-> information to capture the evolution of the graph over time. This dual
-> perspective is crucial for identifying the sequential stages of APTs.
-> Furthermore, to address privacy and scalability concerns, we deploy our
-> architecture in a federated learning environment. This setup ensures that local
-> data remains on-premise while encrypted model-weights are shared and aggregated
-> using homomorphic encryption, maintaining data privacy and security. Our
-> evaluation shows that this system effectively detects APTs with lower false
-> positive rates and optimized resource usage compared to existing methods,
-> highlighting the potential of spatio-temporal analysis and federated learning
-> in enhancing cybersecurity defenses.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the 'MUST' criteria as it primarily focuses on detecting APT attacks using Spatio-Temporal Graph Neural Networks in cybersecurity, with no emphasis on engineering, design, or optimization of prompts for Large Language Models (LLMs)."
-}
-
----
-
-## [A Bio-Inspired Research Paradigm of Collision Perception Neurons
-  Enabling Neuro-Robotic Integration: The LGMD Case](https://arxiv.org/abs/http://arxiv.org/abs/2501.02982v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02982v1
-
-**Abstract:**
-> Compared to human vision, insect visual systems excel at rapid and precise
-> collision detection, despite relying on only tens of thousands of neurons
-> organized through a few neuropils. This efficiency makes them an attractive
-> model system for developing artificial collision-detecting systems.
-> Specifically, researchers have identified collision-selective neurons in the
-> locust's optic lobe, called lobula giant movement detectors (LGMDs), which
-> respond specifically to approaching objects. Research upon LGMD neurons began
-> in the early 1970s. Initially, due to their large size, these neurons were
-> identified as motion detectors, but their role as looming detectors was
-> recognized over time. Since then, progress in neuroscience, computational
-> modeling of LGMD's visual neural circuits, and LGMD-based robotics has advanced
-> in tandem, each field supporting and driving the others. Today, with a deeper
-> understanding of LGMD neurons, LGMD-based models have significantly improved
-> collision-free navigation in mobile robots including ground and aerial robots.
-> This review highlights recent developments in LGMD research from the
-> perspectives of neuroscience, computational modeling, and robotics. It
-> emphasizes a biologically plausible research paradigm, where insights from
-> neuroscience inform real-world applications, which would in turn validate and
-> advance neuroscience. With strong support from extensive research and growing
-> application demand, this paradigm has reached a mature stage and demonstrates
-> versatility across different areas of neuroscience research, thereby enhancing
-> our understanding of the interconnections between neuroscience, computational
-> modeling, and robotics. Furthermore, other motion-sensitive neurons have also
-> shown promising potential for adopting this research paradigm.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance, instead focusing on bio-inspired robotics, neuroscience, and collision detection using LGMD neurons.
-
----
-
-## [To Analyze and Regulate Human-in-the-loop Learning for Congestion Games](https://arxiv.org/abs/http://arxiv.org/abs/2501.03055v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03055v2
-
-**Abstract:**
-> In congestion games, selfish users behave myopically to crowd to the shortest
-> paths, and the social planner designs mechanisms to regulate such selfish
-> routing through information or payment incentives. However, such mechanism
-> design requires the knowledge of time-varying traffic conditions and it is the
-> users themselves to learn and report past road experiences to the social
-> planner (e.g., Waze or Google Maps). When congestion games meet mobile
-> crowdsourcing, it is critical to incentivize selfish users to explore
-> non-shortest paths in the best exploitation-exploration trade-off. First, we
-> consider a simple but fundamental parallel routing network with one
-> deterministic path and multiple stochastic paths for users with an average
-> arrival probability $\lambda$. We prove that the current myopic routing policy
-> (widely used in Waze and Google Maps) misses both exploration (when strong
-> hazard belief) and exploitation (when weak hazard belief) as compared to the
-> social optimum. Due to the myopic policy's under-exploration, we prove that the
-> caused price of anarchy (PoA) is larger than
-> \(\frac{1}{1-\rho^{\frac{1}{\lambda}}}\), which can be arbitrarily large as
-> discount factor \(\rho\rightarrow1\). To mitigate such huge efficiency loss, we
-> propose a novel selective information disclosure (SID) mechanism: we only
-> reveal the latest traffic information to users when they intend to over-explore
-> stochastic paths upon arrival, while hiding such information when they want to
-> under-explore. We prove that our mechanism successfully reduces PoA to be less
-> than~\(2\). Besides the parallel routing network, we further extend our
-> mechanism and PoA results to any linear path graphs with multiple intermediate
-> nodes.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper does not meet the 'MUST' criteria as it does not focus primarily on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it addresses mechanism design for regulating human-in-the-loop learning in congestion games, with no apparent connection to LLMs or prompt engineering for text-based interactions."
-}
-
----
-
-## [Survival Analysis Revisited: Understanding and Unifying Poisson,
-  Exponential, and Cox Models in Fall Risk Analysis](https://arxiv.org/abs/http://arxiv.org/abs/2501.03058v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03058v1
-
-**Abstract:**
-> This paper explores foundational and applied aspects of survival analysis,
-> using fall risk assessment as a case study. It revisits key time-related
-> probability distributions and statistical methods, including logistic
-> regression, Poisson regression, Exponential regression, and the Cox
-> Proportional Hazards model, offering a unified perspective on their
-> relationships within the survival analysis framework. A contribution of this
-> work is the step-by-step derivation and clarification of the relationships
-> among these models, particularly demonstrating that Poisson regression in the
-> survival context is a specific case of the Cox model. These insights address
-> gaps in understanding and reinforce the simplicity and interpretability of
-> survival models. The paper also emphasizes the practical utility of survival
-> analysis by connecting theoretical insights with real-world applications. In
-> the context of fall detection, it demonstrates how these models can
-> simultaneously predict fall risk, analyze contributing factors, and estimate
-> time-to-event outcomes within a single streamlined framework. In contrast,
-> advanced deep learning methods often require complex post-hoc interpretation
-> and separate training for different tasks particularly when working with
-> structured numerical data. This highlights the enduring relevance of classical
-> statistical frameworks and makes survival models especially valuable in
-> healthcare settings, where explainability and robustness are critical. By
-> unifying foundational concepts and offering a cohesive perspective on
-> time-to-event analysis, this work serves as an accessible resource for
-> understanding survival models and applying them effectively to diverse
-> analytical challenges.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it primarily focuses on survival analysis, statistical methods, and their application in healthcare, with no apparent emphasis on prompt engineering, Large Language Models (LLMs), or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video
-  Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03059v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03059v1
-
-**Abstract:**
-> We consider the task of Image-to-Video (I2V) generation, which involves
-> transforming static images into realistic video sequences based on a textual
-> description. While recent advancements produce photorealistic outputs, they
-> frequently struggle to create videos with accurate and consistent object
-> motion, especially in multi-object scenarios. To address these limitations, we
-> propose a two-stage compositional framework that decomposes I2V generation
-> into: (i) An explicit intermediate representation generation stage, followed by
-> (ii) A video generation stage that is conditioned on this representation. Our
-> key innovation is the introduction of a mask-based motion trajectory as an
-> intermediate representation, that captures both semantic object information and
-> motion, enabling an expressive but compact representation of motion and
-> semantics. To incorporate the learned representation in the second stage, we
-> utilize object-level attention objectives. Specifically, we consider a spatial,
-> per-object, masked-cross attention objective, integrating object-specific
-> prompts into corresponding latent space regions and a masked spatio-temporal
-> self-attention objective, ensuring frame-to-frame consistency for each object.
-> We evaluate our method on challenging benchmarks with multi-object and
-> high-motion scenarios and empirically demonstrate that the proposed method
-> achieves state-of-the-art results in temporal coherence, motion realism, and
-> text-prompt faithfulness. Additionally, we introduce \benchmark, a new
-> challenging benchmark for single-object and multi-object I2V generation, and
-> demonstrate our method's superiority on this benchmark. Project page is
-> available at https://guyyariv.github.io/TTM/.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on Image-to-Video (I2V) generation, a generative AI application other than text generation driven by LLMs, and does not centrally concern prompt engineering for text-based interactions with LLMs, despite mentioning object-specific prompts as a component of the methodology.
-
----
-
-## [The Scaling Law for LoRA Base on Mutual Information Upper Bound](https://arxiv.org/abs/http://arxiv.org/abs/2501.03152v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03152v1
-
-**Abstract:**
-> LoRA (Low-Rank Adaptation) is a widely used model fine-tuning method. In
-> fine-tuning, the law among model performance, model parameters, and data
-> complexity has been a focal issue in the field. Existing methods often leverage
-> external metrics (such as cross-entropy or perplexity) to evaluate model
-> performance. In the fine-tuning process for large models, two types of
-> knowledge are typically involved: the frozen, general knowledge acquired by the
-> model during pre-training and the new knowledge learned through the LoRA module
-> from the current data. Generally, the less LoRA's learned knowledge relies on
-> the large model, the more it captures the specific knowledge of new data,
-> thereby enhancing its adaptability to new tasks. However, external metrics do
-> not readily capture the dependency relationship between these two types of
-> knowledge. Therefore, we designed an internal metric based on the Mutual
-> Information Upper Bound (MIUB) theory to investigate the scaling law of
-> large-model LoRA fine-tuning. In our experiments, we validated this approach on
-> benchmark datasets, using the Llama3-8B and Phi3-3B models. The results show
-> that the proposed MIUB metric aligns more accurately and stably with the
-> scaling law of LoRA fine-tuning compared to cross-entropy and perplexity.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on model fine-tuning (LoRA) and evaluating its performance using an internal metric (Mutual Information Upper Bound), rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Detecting AI-Generated Text in Educational Content: Leveraging Machine
-  Learning and Explainable AI for Academic Integrity](https://arxiv.org/abs/http://arxiv.org/abs/2501.03203v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03203v1
-
-**Abstract:**
-> This study seeks to enhance academic integrity by providing tools to detect
-> AI-generated content in student work using advanced technologies. The findings
-> promote transparency and accountability, helping educators maintain ethical
-> standards and supporting the responsible integration of AI in education. A key
-> contribution of this work is the generation of the CyberHumanAI dataset, which
-> has 1000 observations, 500 of which are written by humans and the other 500
-> produced by ChatGPT. We evaluate various machine learning (ML) and deep
-> learning (DL) algorithms on the CyberHumanAI dataset comparing human-written
-> and AI-generated content from Large Language Models (LLMs) (i.e., ChatGPT).
-> Results demonstrate that traditional ML algorithms, specifically XGBoost and
-> Random Forest, achieve high performance (83% and 81% accuracies respectively).
-> Results also show that classifying shorter content seems to be more challenging
-> than classifying longer content. Further, using Explainable Artificial
-> Intelligence (XAI) we identify discriminative features influencing the ML
-> model's predictions, where human-written content tends to use a practical
-> language (e.g., use and allow). Meanwhile AI-generated text is characterized by
-> more abstract and formal terms (e.g., realm and employ). Finally, a comparative
-> analysis with GPTZero show that our narrowly focused, simple, and fine-tuned
-> model can outperform generalized systems like GPTZero. The proposed model
-> achieved approximately 77.5% accuracy compared to GPTZero's 48.5% accuracy when
-> tasked to classify Pure AI, Pure Human, and mixed class. GPTZero showed a
-> tendency to classify challenging and small-content cases as either mixed or
-> unrecognized while our proposed model showed a more balanced performance across
-> the three classes.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on detecting AI-generated text in educational content using machine learning and Explainable AI, rather than on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), as required by the criteria.
-
----
-
-## [BoostStep: Boosting mathematical capability of Large Language Models via
-  improved single-step reasoning](https://arxiv.org/abs/http://arxiv.org/abs/2501.03226v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03226v2
-
-**Abstract:**
-> Cutting-edge large language models (LLMs) demonstrate promising performance
-> in solving complex math problems with a divide-and-conquer pipeline and the
-> assistance of in-context learning (ICL) examples. However, their potential for
-> improvement is limited by two critical problems within their ICL examples:
-> granularity-mismatch and the ensuing negative-effect noise problem.
-> Specifically, the LLMs are capable of the dividing process yet mostly failed by
-> inaccurate reasoning within a few conquer steps, while the ICL examples
-> retrieved in question-grained sometimes lack relevant steps for a specific
-> challenging reasoning step. Further, this disconnect may hinder the correct
-> reasoning due to its irrelevance. To this end, we focus on improving the
-> reasoning quality within each step and present BoostStep. BoostStep aligns the
-> granularity between the retrieving and reasoning on step grained, and provides
-> highly related ICL examples for each reasoning step with a novel `first-try'
-> strategy. BoostStep provides more relevant examples than the coarse
-> question-grained strategy, enhancing the model reasoning quality within each
-> step steadily. BoostStep is a general and robust reasoning-enhancing method
-> that not only improves standalone reasoning performance but also integrates
-> seamlessly with Monte Carlo Tree Search methods (MCTS) to refine both candidate
-> generation and decision-making. Quantitatively, it improves GPT-4o and
-> Qwen2.5-Math-72B by 3.6\% and 2.0\% respectively on various mathematical
-> benchmarks, and 7.5\% gain combined with MCTS.
-
-**Decision Explanation:**
-Original decision: REJECT
-Although the paper improves LLM performance through refined in-context learning (ICL) examples, its primary focus is on enhancing mathematical reasoning capabilities within LLMs via a novel strategy, rather than prompt engineering techniques specifically for text-based interactions with LLMs.
-
----
-
-## [DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich
-  Paradigm for Direct Preference Optimization](https://arxiv.org/abs/http://arxiv.org/abs/2501.03271v3)
-**arXiv ID:** http://arxiv.org/abs/2501.03271v3
-
-**Abstract:**
-> The rapid rise of large language models (LLMs) has unlocked many applications
-> but also underscores the challenge of aligning them with diverse values and
-> preferences. Direct Preference Optimization (DPO) is central to alignment but
-> constrained by fixed divergences and limited feature transformations. We
-> propose DPO-Kernels, which integrates kernel methods to address these issues
-> through four key contributions: (i) Kernelized Representations with polynomial,
-> RBF, Mahalanobis, and spectral kernels for richer transformations, plus a
-> hybrid loss combining embedding-based and probability-based objectives; (ii)
-> Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya,
-> Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven
-> Selection metrics that automatically choose the best kernel-divergence pair;
-> and (iv) a Hierarchical Mixture of Kernels for both local precision and global
-> modeling. Evaluations on 12 datasets demonstrate state-of-the-art performance
-> in factuality, safety, reasoning, and instruction following. Grounded in
-> Heavy-Tailed Self-Regularization, DPO-Kernels maintains robust generalization
-> for LLMs, offering a comprehensive resource for further alignment research.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on Direct Preference Optimization (DPO) for aligning Large Language Models (LLMs) with diverse values and preferences, rather than prompt engineering specifically for LLMs. While LLMs are mentioned, the core subject is DPO methodology, not the design, optimization, or manipulation of textual input prompts for improving LLM performance."
-}
-
----
-
-## [Revolutionizing Encrypted Traffic Classification with MH-Net: A
-  Multi-View Heterogeneous Graph Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.03279v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03279v1
-
-**Abstract:**
-> With the growing significance of network security, the classification of
-> encrypted traffic has emerged as an urgent challenge. Traditional byte-based
-> traffic analysis methods are constrained by the rigid granularity of
-> information and fail to fully exploit the diverse correlations between bytes.
-> To address these limitations, this paper introduces MH-Net, a novel approach
-> for classifying network traffic that leverages multi-view heterogeneous traffic
-> graphs to model the intricate relationships between traffic bytes. The essence
-> of MH-Net lies in aggregating varying numbers of traffic bits into multiple
-> types of traffic units, thereby constructing multi-view traffic graphs with
-> diverse information granularities. By accounting for different types of byte
-> correlations, such as header-payload relationships, MH-Net further endows the
-> traffic graph with heterogeneity, significantly enhancing model performance.
-> Notably, we employ contrastive learning in a multi-task manner to strengthen
-> the robustness of the learned traffic unit representations. Experiments
-> conducted on the ISCX and CIC-IoT datasets for both the packet-level and
-> flow-level traffic classification tasks demonstrate that MH-Net achieves the
-> best overall performance compared to dozens of SOTA methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the primary focus criteria, as it revolves around network traffic classification using a multi-view heterogeneous graph model (MH-Net), with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to satisfy the mandatory 'MUST' criteria.
-
----
-
-## [A Decision-Based Heterogenous Graph Attention Network for Multi-Class
-  Fake News Detection](https://arxiv.org/abs/http://arxiv.org/abs/2501.03290v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03290v1
-
-**Abstract:**
-> A promising tool for addressing fake news detection is Graph Neural Networks
-> (GNNs). However, most existing GNN-based methods rely on binary classification,
-> categorizing news as either real or fake. Additionally, traditional GNN models
-> use a static neighborhood for each node, making them susceptible to issues like
-> over-squashing. In this paper, we introduce a novel model named Decision-based
-> Heterogeneous Graph Attention Network (DHGAT) for fake news detection in a
-> semi-supervised setting. DHGAT effectively addresses the limitations of
-> traditional GNNs by dynamically optimizing and selecting the neighborhood type
-> for each node in every layer. It represents news data as a heterogeneous graph
-> where nodes (news items) are connected by various types of edges. The
-> architecture of DHGAT consists of a decision network that determines the
-> optimal neighborhood type and a representation network that updates node
-> embeddings based on this selection. As a result, each node learns an optimal
-> and task-specific computational graph, enhancing both the accuracy and
-> efficiency of the fake news detection process. We evaluate DHGAT on the LIAR
-> dataset, a large and challenging dataset for multi-class fake news detection,
-> which includes news items categorized into six classes. Our results demonstrate
-> that DHGAT outperforms existing methods, improving accuracy by approximately 4%
-> and showing robustness with limited labeled data.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a new Graph Neural Network (GNN) architecture for multi-class fake news detection, which does not meet the 'MUST' criteria of primarily focusing on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs) or investigating methods for improving LLM performance through textual input prompt manipulation.
-
----
-
-## [Multi-Modal One-Shot Federated Ensemble Learning for Medical Data with
-  Vision Large Language Model](https://arxiv.org/abs/http://arxiv.org/abs/2501.03292v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03292v1
-
-**Abstract:**
-> Federated learning (FL) has attracted considerable interest in the medical
-> domain due to its capacity to facilitate collaborative model training while
-> maintaining data privacy. However, conventional FL methods typically
-> necessitate multiple communication rounds, leading to significant communication
-> overhead and delays, especially in environments with limited bandwidth.
-> One-shot federated learning addresses these issues by conducting model training
-> and aggregation in a single communication round, thereby reducing communication
-> costs while preserving privacy. Among these, one-shot federated ensemble
-> learning combines independently trained client models using ensemble techniques
-> such as voting, further boosting performance in non-IID data scenarios. On the
-> other hand, existing machine learning methods in healthcare predominantly use
-> unimodal data (e.g., medical images or textual reports), which restricts their
-> diagnostic accuracy and comprehensiveness. Therefore, the integration of
-> multi-modal data is proposed to address these shortcomings. In this paper, we
-> introduce FedMME, an innovative one-shot multi-modal federated ensemble
-> learning framework that utilizes multi-modal data for medical image analysis.
-> Specifically, FedMME capitalizes on vision large language models to produce
-> textual reports from medical images, employs a BERT model to extract textual
-> features from these reports, and amalgamates these features with visual
-> features to improve diagnostic accuracy. Experimental results show that our
-> method demonstrated superior performance compared to existing one-shot
-> federated learning methods in healthcare scenarios across four datasets with
-> various data distributions. For instance, it surpasses existing one-shot
-> federated learning approaches by more than 17.5% in accuracy on the RSNA
-> dataset when applying a Dirichlet distribution with ($\alpha$ = 0.3).
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the development of a one-shot federated ensemble learning framework for medical data, using a Vision Large Language Model as a component, rather than focusing specifically on the engineering, design, or optimization of prompts for Large Language Models.
-
----
-
-## [Rethinking Byzantine Robustness in Federated Recommendation from Sparse
-  Aggregation Perspective](https://arxiv.org/abs/http://arxiv.org/abs/2501.03301v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03301v2
-
-**Abstract:**
-> To preserve user privacy in recommender systems, federated recommendation
-> (FR) based on federated learning (FL) emerges, keeping the personal data on the
-> local client and updating a model collaboratively. Unlike FL, FR has a unique
-> sparse aggregation mechanism, where the embedding of each item is updated by
-> only partial clients, instead of full clients in a dense aggregation of general
-> FL. Recently, as an essential principle of FL, model security has received
-> increasing attention, especially for Byzantine attacks, where malicious clients
-> can send arbitrary updates. The problem of exploring the Byzantine robustness
-> of FR is particularly critical since in the domains applying FR, e.g.,
-> e-commerce, malicious clients can be injected easily by registering new
-> accounts. However, existing Byzantine works neglect the unique sparse
-> aggregation of FR, making them unsuitable for our problem. Thus, we make the
-> first effort to investigate Byzantine attacks on FR from the perspective of
-> sparse aggregation, which is non-trivial: it is not clear how to define
-> Byzantine robustness under sparse aggregations and design Byzantine attacks
-> under limited knowledge/capability. In this paper, we reformulate the Byzantine
-> robustness under sparse aggregation by defining the aggregation for a single
-> item as the smallest execution unit. Then we propose a family of effective
-> attack strategies, named Spattack, which exploit the vulnerability in sparse
-> aggregation and are categorized along the adversary's knowledge and capability.
-> Extensive experimental results demonstrate that Spattack can effectively
-> prevent convergence and even break down defenses under a few malicious clients,
-> raising alarms for securing FR systems.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on Byzantine robustness in Federated Recommendation systems, specifically addressing sparse aggregation and security against malicious clients, with no primary concern or investigation into prompt engineering for Large Language Models (LLMs) or their textual input prompts."
-}
-
----
-
-## [Advanced Machine Learning Techniques for Social Support Detection on
-  Social Media](https://arxiv.org/abs/http://arxiv.org/abs/2501.03370v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03370v1
-
-**Abstract:**
-> The widespread use of social media highlights the need to understand its
-> impact, particularly the role of online social support. This study uses a
-> dataset focused on online social support, which includes binary and multiclass
-> classifications of social support content on social media. The classification
-> of social support is divided into three tasks. The first task focuses on
-> distinguishing between supportive and non-supportive. The second task aims to
-> identify whether the support is directed toward an individual or a group. The
-> third task categorizes the specific type of social support, grouping it into
-> categories such as Nation, LGBTQ, Black people, Women, Religion, and Other (if
-> it does not fit into the previously mentioned categories). To address data
-> imbalances in these tasks, we employed K-means clustering for balancing the
-> dataset and compared the results with the original unbalanced data. Using
-> advanced machine learning techniques, including transformers and zero-shot
-> learning approaches with GPT3, GPT4, and GPT4-o, we predict social support
-> levels in various contexts. The effectiveness of the dataset is evaluated using
-> baseline models across different learning approaches, with transformer-based
-> methods demonstrating superior performance. Additionally, we achieved a 0.4\%
-> increase in the macro F1 score for the second task and a 0.7\% increase for the
-> third task, compared to previous work utilizing traditional machine learning
-> with psycholinguistic and unigram-based TF-IDF values.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus primarily on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it utilizes LLMs (GPT3, GPT4) as components for social support detection on social media, with the primary focus being on advanced machine learning techniques for classification tasks, not prompt engineering.
-
----
-
-## [License Plate Images Generation with Diffusion Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.03374v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03374v1
-
-**Abstract:**
-> Despite the evident practical importance of license plate recognition (LPR),
-> corresponding research is limited by the volume of publicly available datasets
-> due to privacy regulations such as the General Data Protection Regulation
-> (GDPR). To address this challenge, synthetic data generation has emerged as a
-> promising approach. In this paper, we propose to synthesize realistic license
-> plates (LPs) using diffusion models, inspired by recent advances in image and
-> video generation. In our experiments a diffusion model was successfully trained
-> on a Ukrainian LP dataset, and 1000 synthetic images were generated for
-> detailed analysis. Through manual classification and annotation of the
-> generated images, we performed a thorough study of the model output, such as
-> success rate, character distributions, and type of failures. Our contributions
-> include experimental validation of the efficacy of diffusion models for LP
-> synthesis, along with insights into the characteristics of the generated data.
-> Furthermore, we have prepared a synthetic dataset consisting of 10,000 LP
-> images, publicly available at https://zenodo.org/doi/10.5281/zenodo.13342102.
-> Conducted experiments empirically confirm the usefulness of synthetic data for
-> the LPR task. Despite the initial performance gap between the model trained
-> with real and synthetic data, the expansion of the training data set with
-> pseudolabeled synthetic data leads to an improvement in LPR accuracy by 3%
-> compared to baseline.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on generating license plate images using diffusion models, which is an application of generative AI in image generation, not text generation driven by Large Language Models (LLMs), and does not investigate prompt engineering for LLMs.
-
----
-
-## [Activating Associative Disease-Aware Vision Token Memory for LLM-Based
-  X-ray Report Generation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03458v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03458v1
-
-**Abstract:**
-> X-ray image based medical report generation achieves significant progress in
-> recent years with the help of the large language model, however, these models
-> have not fully exploited the effective information in visual image regions,
-> resulting in reports that are linguistically sound but insufficient in
-> describing key diseases. In this paper, we propose a novel associative
-> memory-enhanced X-ray report generation model that effectively mimics the
-> process of professional doctors writing medical reports. It considers both the
-> mining of global and local visual information and associates historical report
-> information to better complete the writing of the current report. Specifically,
-> given an X-ray image, we first utilize a classification model along with its
-> activation maps to accomplish the mining of visual regions highly associated
-> with diseases and the learning of disease query tokens. Then, we employ a
-> visual Hopfield network to establish memory associations for disease-related
-> tokens, and a report Hopfield network to retrieve report memory information.
-> This process facilitates the generation of high-quality reports based on a
-> large language model and achieves state-of-the-art performance on multiple
-> benchmark datasets, including the IU X-ray, MIMIC-CXR, and Chexpert Plus. The
-> source code of this work is released on
-> \url{https://github.com/Event-AHU/Medical_Image_Analysis}.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a novel model for X-ray report generation in a medical context, which violates the 'MUST NOT' criteria of being primarily concerned with medical subjects and not focusing on prompt engineering for text-based interactions with LLMs as the core subject.
-
----
-
-## [Radar Signal Recognition through Self-Supervised Learning and Domain
-  Adaptation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03461v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03461v2
-
-**Abstract:**
-> Automatic radar signal recognition (RSR) plays a pivotal role in electronic
-> warfare (EW), as accurately classifying radar signals is critical for informing
-> decision-making processes. Recent advances in deep learning have shown
-> significant potential in improving RSR performance in domains with ample
-> annotated data. However, these methods fall short in EW scenarios where
-> annotated RF data are scarce or impractical to obtain. To address these
-> challenges, we introduce a self-supervised learning (SSL) method which utilises
-> masked signal modelling and RF domain adaption to enhance RSR performance in
-> environments with limited RF samples and labels. Specifically, we investigate
-> pre-training masked autoencoders (MAE) on baseband in-phase and quadrature
-> (I/Q) signals from various RF domains and subsequently transfer the learned
-> representation to the radar domain, where annotated data are limited. Empirical
-> results show that our lightweight self-supervised ResNet model with domain
-> adaptation achieves up to a 17.5% improvement in 1-shot classification accuracy
-> when pre-trained on in-domain signals (i.e., radar signals) and up to a 16.31%
-> improvement when pre-trained on out-of-domain signals (i.e., comm signals),
-> compared to its baseline without SSL. We also provide reference results for
-> several MAE designs and pre-training strategies, establishing a new benchmark
-> for few-shot radar signal classification.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on self-supervised learning and domain adaptation for radar signal recognition, with no discussion on prompt engineering, Large Language Models (LLMs), or textual input prompts, thus failing to meet all 'MUST' criteria.
-
----
-
-## [KG-TRICK: Unifying Textual and Relational Information Completion of
-  Knowledge for Multilingual Knowledge Graphs](https://arxiv.org/abs/http://arxiv.org/abs/2501.03560v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03560v1
-
-**Abstract:**
-> Multilingual knowledge graphs (KGs) provide high-quality relational and
-> textual information for various NLP applications, but they are often
-> incomplete, especially in non-English languages. Previous research has shown
-> that combining information from KGs in different languages aids either
-> Knowledge Graph Completion (KGC), the task of predicting missing relations
-> between entities, or Knowledge Graph Enhancement (KGE), the task of predicting
-> missing textual information for entities. Although previous efforts have
-> considered KGC and KGE as independent tasks, we hypothesize that they are
-> interdependent and mutually beneficial. To this end, we introduce KG-TRICK, a
-> novel sequence-to-sequence framework that unifies the tasks of textual and
-> relational information completion for multilingual KGs. KG-TRICK demonstrates
-> that: i) it is possible to unify the tasks of KGC and KGE into a single
-> framework, and ii) combining textual information from multiple languages is
-> beneficial to improve the completeness of a KG. As part of our contributions,
-> we also introduce WikiKGE10++, the largest manually-curated benchmark for
-> textual information completion of KGs, which features over 25,000 entities
-> across 10 diverse languages.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on unifying Knowledge Graph Completion and Enhancement tasks for multilingual Knowledge Graphs, utilizing a sequence-to-sequence framework, without explicit concentration on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [From Code to Compliance: Assessing ChatGPT's Utility in Designing an
-  Accessible Webpage -- A Case Study](https://arxiv.org/abs/http://arxiv.org/abs/2501.03572v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03572v1
-
-**Abstract:**
-> Web accessibility ensures that individuals with disabilities can access and
-> interact with digital content without barriers, yet a significant majority of
-> most used websites fail to meet accessibility standards. This study evaluates
-> ChatGPT's (GPT-4o) ability to generate and improve web pages in line with Web
-> Content Accessibility Guidelines (WCAG). While ChatGPT can effectively address
-> accessibility issues when prompted, its default code often lacks compliance,
-> reflecting limitations in its training data and prevailing inaccessible web
-> practices. Automated and manual testing revealed strengths in resolving simple
-> issues but challenges with complex tasks, requiring human oversight and
-> additional iterations. Unlike prior studies, we incorporate manual evaluation,
-> dynamic elements, and use the visual reasoning capability of ChatGPT along with
-> the prompts to fix accessibility issues. Providing screenshots alongside
-> prompts enhances the LLM's ability to address accessibility issues by allowing
-> it to analyze surrounding components, such as determining appropriate contrast
-> colors. We found that effective prompt engineering, such as providing concise,
-> structured feedback and incorporating visual aids, significantly enhances
-> ChatGPT's performance. These findings highlight the potential and limitations
-> of large language models for accessible web development, offering practical
-> guidance for developers to create more inclusive websites.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper's primary focus is leveraging ChatGPT for accessible web development, not prompt engineering for Large Language Models. While it mentions effective prompt engineering techniques, this is secondary to its main objective of improving web accessibility."
-}
-
----
-
-## [Action Quality Assessment via Hierarchical Pose-guided Multi-stage
-  Contrastive Regression](https://arxiv.org/abs/http://arxiv.org/abs/2501.03674v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03674v1
-
-**Abstract:**
-> Action Quality Assessment (AQA), which aims at automatic and fair evaluation
-> of athletic performance, has gained increasing attention in recent years.
-> However, athletes are often in rapid movement and the corresponding visual
-> appearance variances are subtle, making it challenging to capture fine-grained
-> pose differences and leading to poor estimation performance. Furthermore, most
-> common AQA tasks, such as diving in sports, are usually divided into multiple
-> sub-actions, each of which contains different durations. However, existing
-> methods focus on segmenting the video into fixed frames, which disrupts the
-> temporal continuity of sub-actions resulting in unavoidable prediction errors.
-> To address these challenges, we propose a novel action quality assessment
-> method through hierarchically pose-guided multi-stage contrastive regression.
-> Firstly, we introduce a multi-scale dynamic visual-skeleton encoder to capture
-> fine-grained spatio-temporal visual and skeletal features. Then, a procedure
-> segmentation network is introduced to separate different sub-actions and obtain
-> segmented features. Afterwards, the segmented visual and skeletal features are
-> both fed into a multi-modal fusion module as physics structural priors, to
-> guide the model in learning refined activity similarities and variances.
-> Finally, a multi-stage contrastive learning regression approach is employed to
-> learn discriminative representations and output prediction results. In
-> addition, we introduce a newly-annotated FineDiving-Pose Dataset to improve the
-> current low-quality human pose labels. In experiments, the results on
-> FineDiving and MTL-AQA datasets demonstrate the effectiveness and superiority
-> of our proposed approach. Our source code and dataset are available at
-> https://github.com/Lumos0507/HP-MCoRe.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on Action Quality Assessment using computer vision and pose-guided contrastive regression, with no mention of Large Language Models (LLMs), prompt engineering, or text-based interactions, thus failing to meet all 'MUST' criteria.
-
----
-
-## [MAJL: A Model-Agnostic Joint Learning Framework for Music Source
-  Separation and Pitch Estimation](https://arxiv.org/abs/http://arxiv.org/abs/2501.03689v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03689v1
-
-**Abstract:**
-> Music source separation and pitch estimation are two vital tasks in music
-> information retrieval. Typically, the input of pitch estimation is obtained
-> from the output of music source separation. Therefore, existing methods have
-> tried to perform these two tasks simultaneously, so as to leverage the mutually
-> beneficial relationship between both tasks. However, these methods still face
-> two critical challenges that limit the improvement of both tasks: the lack of
-> labeled data and joint learning optimization. To address these challenges, we
-> propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL
-> is a generic framework and can use variant models for each task. It includes a
-> two-stage training method and a dynamic weighting method named Dynamic Weights
-> on Hard Samples (DWHS), which addresses the lack of labeled data and joint
-> learning optimization, respectively. Experimental results on public music
-> datasets show that MAJL outperforms state-of-the-art methods on both tasks,
-> with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for
-> music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch
-> estimation. Furthermore, comprehensive studies not only validate the
-> effectiveness of each component of MAJL, but also indicate the great generality
-> of MAJL in adapting to different model architectures.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses on music source separation and pitch estimation, leveraging joint learning frameworks, without any mention of Large Language Models (LLMs), prompt engineering, or text-based interactions, thus falling outside the specified scope.
-
----
-
-## [AuxDepthNet: Real-Time Monocular 3D Object Detection with
-  Depth-Sensitive Features](https://arxiv.org/abs/http://arxiv.org/abs/2501.03700v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03700v1
-
-**Abstract:**
-> Monocular 3D object detection is a challenging task in autonomous systems due
-> to the lack of explicit depth information in single-view images. Existing
-> methods often depend on external depth estimators or expensive sensors, which
-> increase computational complexity and hinder real-time performance. To overcome
-> these limitations, we propose AuxDepthNet, an efficient framework for real-time
-> monocular 3D object detection that eliminates the reliance on external depth
-> maps or pre-trained depth models. AuxDepthNet introduces two key components:
-> the Auxiliary Depth Feature (ADF) module, which implicitly learns
-> depth-sensitive features to improve spatial reasoning and computational
-> efficiency, and the Depth Position Mapping (DPM) module, which embeds depth
-> positional information directly into the detection process to enable accurate
-> object localization and 3D bounding box regression. Leveraging the DepthFusion
-> Transformer architecture, AuxDepthNet globally integrates visual and
-> depth-sensitive features through depth-guided interactions, ensuring robust and
-> efficient detection. Extensive experiments on the KITTI dataset show that
-> AuxDepthNet achieves state-of-the-art performance, with $\text{AP}_{3D}$ scores
-> of 24.72\% (Easy), 18.63\% (Moderate), and 15.31\% (Hard), and
-> $\text{AP}_{\text{BEV}}$ scores of 34.11\% (Easy), 25.18\% (Moderate), and
-> 21.90\% (Hard) at an IoU threshold of 0.7.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on monocular 3D object detection for autonomous systems, involving depth estimation and sensor technologies, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions, thus failing all 'MUST' criteria.
-
----
-
-## [Self-adaptive vision-language model for 3D segmentation of pulmonary
-  artery and vein](https://arxiv.org/abs/http://arxiv.org/abs/2501.03722v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03722v1
-
-**Abstract:**
-> Accurate segmentation of pulmonary structures iscrucial in clinical
-> diagnosis, disease study, and treatment planning. Significant progress has been
-> made in deep learning-based segmentation techniques, but most require much
-> labeled data for training. Consequently, developing precise segmentation
-> methods that demand fewer labeled datasets is paramount in medical image
-> analysis. The emergence of pre-trained vision-language foundation models, such
-> as CLIP, recently opened the door for universal computer vision tasks.
-> Exploiting the generalization ability of these pre-trained foundation models on
-> downstream tasks, such as segmentation, leads to unexpected performance with a
-> relatively small amount of labeled data. However, exploring these models for
-> pulmonary artery-vein segmentation is still limited. This paper proposes a
-> novel framework called Language-guided self-adaptive Cross-Attention Fusion
-> Framework. Our method adopts pre-trained CLIP as a strong feature extractor for
-> generating the segmentation of 3D CT scans, while adaptively aggregating the
-> cross-modality of text and image representations. We propose a s pecially
-> designed adapter module to fine-tune pre-trained CLIP with a self-adaptive
-> learning strategy to effectively fuse the two modalities of embeddings. We
-> extensively validate our method on a local dataset, which is the largest
-> pulmonary artery-vein CT dataset to date and consists of 718 labeled data in
-> total. The experiments show that our method outperformed other state-of-the-art
-> methods by a large margin. Our data and code will be made publicly available
-> upon acceptance.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on medical image analysis (3D segmentation of pulmonary artery and vein) and the development of a vision-language model, rather than the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs), and does not demonstrate the impact of textual input prompts on LLM output.
-
----
-
-## [Deep Sylvester Posterior Inference for Adaptive Compressed Sensing in
-  Ultrasound Imaging](https://arxiv.org/abs/http://arxiv.org/abs/2501.03825v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03825v1
-
-**Abstract:**
-> Ultrasound images are commonly formed by sequential acquisition of
-> beam-steered scan-lines. Minimizing the number of required scan-lines can
-> significantly enhance frame rate, field of view, energy efficiency, and data
-> transfer speeds. Existing approaches typically use static subsampling schemes
-> in combination with sparsity-based or, more recently, deep-learning-based
-> recovery. In this work, we introduce an adaptive subsampling method that
-> maximizes intrinsic information gain in-situ, employing a Sylvester Normalizing
-> Flow encoder to infer an approximate Bayesian posterior under partial
-> observation in real-time. Using the Bayesian posterior and a deep generative
-> model for future observations, we determine the subsampling scheme that
-> maximizes the mutual information between the subsampled observations, and the
-> next frame of the video. We evaluate our approach using the EchoNet cardiac
-> ultrasound video dataset and demonstrate that our active sampling method
-> outperforms competitive baselines, including uniform and variable-density
-> random sampling, as well as equidistantly spaced scan-lines, improving mean
-> absolute reconstruction error by 15%. Moreover, posterior inference and the
-> sampling scheme generation are performed in just 0.015 seconds (66Hz), making
-> it fast enough for real-time 2D ultrasound imaging applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on adaptive compressed sensing in ultrasound imaging, employing deep learning for image reconstruction, and does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through prompt engineering, thus failing to meet all 'MUST' criteria.
-
----
-
-## [TACLR: A Scalable and Efficient Retrieval-based Method for Industrial
-  Product Attribute Value Identification](https://arxiv.org/abs/http://arxiv.org/abs/2501.03835v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03835v1
-
-**Abstract:**
-> Product Attribute Value Identification (PAVI) involves identifying attribute
-> values from product profiles, a key task for improving product search,
-> recommendations, and business analytics on e-commerce platforms. However,
-> existing PAVI methods face critical challenges, such as inferring implicit
-> values, handling out-of-distribution (OOD) values, and producing normalized
-> outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive
-> Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR
-> formulates PAVI as an information retrieval task by encoding product profiles
-> and candidate values into embeddings and retrieving values based on their
-> similarity to the item embedding. It leverages contrastive training with
-> taxonomy-aware hard negative sampling and employs adaptive inference with
-> dynamic thresholds. TACLR offers three key advantages: (1) it effectively
-> handles implicit and OOD values while producing normalized outputs; (2) it
-> scales to thousands of categories, tens of thousands of attributes, and
-> millions of values; and (3) it supports efficient inference for high-load
-> industrial scenarios. Extensive experiments on proprietary and public datasets
-> validate the effectiveness and efficiency of TACLR. Moreover, it has been
-> successfully deployed in a real-world e-commerce platform, processing millions
-> of product listings daily while supporting dynamic, large-scale attribute
-> taxonomies.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs). Instead, it introduces a retrieval-based method (TACLR) for Product Attribute Value Identification, primarily concerned with information retrieval and taxonomy-aware contrastive learning, without any central focus on prompt engineering for LLMs.
-
----
-
-## [Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video
-  Generation Control](https://arxiv.org/abs/http://arxiv.org/abs/2501.03847v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03847v2
-
-**Abstract:**
-> Diffusion models have demonstrated impressive performance in generating
-> high-quality videos from text prompts or images. However, precise control over
-> the video generation process, such as camera manipulation or content editing,
-> remains a significant challenge. Existing methods for controlled video
-> generation are typically limited to a single control type, lacking the
-> flexibility to handle diverse control demands. In this paper, we introduce
-> Diffusion as Shader (DaS), a novel approach that supports multiple video
-> control tasks within a unified architecture. Our key insight is that achieving
-> versatile video control necessitates leveraging 3D control signals, as videos
-> are fundamentally 2D renderings of dynamic 3D content. Unlike prior methods
-> limited to 2D control signals, DaS leverages 3D tracking videos as control
-> inputs, making the video diffusion process inherently 3D-aware. This innovation
-> allows DaS to achieve a wide range of video controls by simply manipulating the
-> 3D tracking videos. A further advantage of using 3D tracking videos is their
-> ability to effectively link frames, significantly enhancing the temporal
-> consistency of the generated videos. With just 3 days of fine-tuning on 8 H800
-> GPUs using less than 10k videos, DaS demonstrates strong control capabilities
-> across diverse tasks, including mesh-to-video generation, camera control,
-> motion transfer, and object manipulation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on video generation control using diffusion models, which falls under image/video generation rather than text generation driven by Large Language Models (LLMs), violating the 'MUST NOT' criteria related to applications of generative AI.
-
----
-
-## [CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds
-  Ratio on High-Resolution Point Clouds](https://arxiv.org/abs/http://arxiv.org/abs/2501.03879v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03879v1
-
-**Abstract:**
-> Recent research has demonstrated that Large Language Models (LLMs) are not
-> limited to text-only tasks but can also function as multimodal models across
-> various modalities, including audio, images, and videos. In particular,
-> research on 3D Large Multimodal Models (3D LMMs) is making notable strides,
-> driven by the potential of processing higher-dimensional data like point
-> clouds. However, upon closer examination, we find that the visual and textual
-> content within each sample of existing training datasets lacks both high
-> informational granularity and clarity, which serve as a bottleneck for precise
-> cross-modal understanding. To address these issues, we propose CL3DOR,
-> Contrastive Learning for 3D large multimodal models via Odds ratio on
-> high-Resolution point clouds, designed to ensure greater specificity and
-> clarity in both visual and textual content. Specifically, we increase the
-> density of point clouds per object and construct informative hard negative
-> responses in the training dataset to penalize unwanted responses. To leverage
-> hard negative responses, we incorporate the odds ratio as an auxiliary term for
-> contrastive learning into the conventional language modeling loss. CL3DOR
-> achieves state-of-the-art performance in 3D scene understanding and reasoning
-> benchmarks. Additionally, we demonstrate the effectiveness of CL3DOR's key
-> components through extensive experiments.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a new multimodal learning method (CL3DOR) for 3D Large Multimodal Models, which includes but is not centered on prompt engineering for Large Language Models (LLMs). The core contribution is the contrastive learning approach for multimodal understanding, not the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
-  Vision Token](https://arxiv.org/abs/http://arxiv.org/abs/2501.03895v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03895v1
-
-**Abstract:**
-> The advent of real-time large multimodal models (LMMs) like GPT-4o has
-> sparked considerable interest in efficient LMMs. LMM frameworks typically
-> encode visual inputs into vision tokens (continuous representations) and
-> integrate them and textual instructions into the context of large language
-> models (LLMs), where large-scale parameters and numerous context tokens
-> (predominantly vision tokens) result in substantial computational overhead.
-> Previous efforts towards efficient LMMs always focus on replacing the LLM
-> backbone with smaller models, while neglecting the crucial issue of token
-> quantity. In this paper, we introduce LLaVA-Mini, an efficient LMM with minimal
-> vision tokens. To achieve a high compression ratio of vision tokens while
-> preserving visual information, we first analyze how LMMs understand vision
-> tokens and find that most vision tokens only play a crucial role in the early
-> layers of LLM backbone, where they mainly fuse visual information into text
-> tokens. Building on this finding, LLaVA-Mini introduces modality pre-fusion to
-> fuse visual information into text tokens in advance, thereby facilitating the
-> extreme compression of vision tokens fed to LLM backbone into one token.
-> LLaVA-Mini is a unified large multimodal model that can support the
-> understanding of images, high-resolution images, and videos in an efficient
-> manner. Experiments across 11 image-based and 7 video-based benchmarks
-> demonstrate that LLaVA-Mini outperforms LLaVA-v1.5 with just 1 vision token
-> instead of 576. Efficiency analyses reveal that LLaVA-Mini can reduce FLOPs by
-> 77%, deliver low-latency responses within 40 milliseconds, and process over
-> 10,000 frames of video on the GPU hardware with 24GB of memory.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on developing an efficient Large Multimodal Model (LMM) for image and video processing, optimizing vision tokens, and reducing computational overhead, rather than engineering or optimizing prompts specifically for Large Language Models (LLMs) and their text-based interactions."
-}
-
----
-
-## [Explainable Time Series Prediction of Tyre Energy in Formula One Race
-  Strategy](https://arxiv.org/abs/http://arxiv.org/abs/2501.04067v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04067v1
-
-**Abstract:**
-> Formula One (F1) race strategy takes place in a high-pressure and fast-paced
-> environment where split-second decisions can drastically affect race results.
-> Two of the core decisions of race strategy are when to make pit stops (i.e.
-> replace the cars' tyres) and which tyre compounds (hard, medium or soft, in
-> normal conditions) to select. The optimal pit stop decisions can be determined
-> by estimating the tyre degradation of these compounds, which in turn can be
-> computed from the energy applied to each tyre, i.e. the tyre energy. In this
-> work, we trained deep learning models, using the Mercedes-AMG PETRONAS F1
-> team's historic race data consisting of telemetry, to forecast tyre energies
-> during races. Additionally, we fitted XGBoost, a decision tree-based machine
-> learning algorithm, to the same dataset and compared the results, with both
-> giving impressive performance. Furthermore, we incorporated two different
-> explainable AI methods, namely feature importance and counterfactual
-> explanations, to gain insights into the reasoning behind the forecasts. Our
-> contributions thus result in an explainable, automated method which could
-> assist F1 teams in optimising their race strategy.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing an explainable AI method for time series prediction in Formula One racing, utilizing deep learning models and XGBoost, with no primary emphasis on the engineering, design, or optimization of prompts specifically for Large Language Models (LLMs).
-
----
-
-## [Explainable Reinforcement Learning for Formula One Race Strategy](https://arxiv.org/abs/http://arxiv.org/abs/2501.04068v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04068v1
-
-**Abstract:**
-> In Formula One, teams compete to develop their cars and achieve the highest
-> possible finishing position in each race. During a race, however, teams are
-> unable to alter the car, so they must improve their cars' finishing positions
-> via race strategy, i.e. optimising their selection of which tyre compounds to
-> put on the car and when to do so. In this work, we introduce a reinforcement
-> learning model, RSRL (Race Strategy Reinforcement Learning), to control race
-> strategies in simulations, offering a faster alternative to the industry
-> standard of hard-coded and Monte Carlo-based race strategies. Controlling cars
-> with a pace equating to an expected finishing position of P5.5 (where P1
-> represents first place and P20 is last place), RSRL achieves an average
-> finishing position of P5.33 on our test race, the 2023 Bahrain Grand Prix,
-> outperforming the best baseline of P5.63. We then demonstrate, in a
-> generalisability study, how performance for one track or multiple tracks can be
-> prioritised via training. Further, we supplement model predictions with feature
-> importance, decision tree-based surrogate models, and decision tree
-> counterfactuals towards improving user trust in the model. Finally, we provide
-> illustrations which exemplify our approach in real-world situations, drawing
-> parallels between simulations and reality.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a reinforcement learning model for optimizing Formula One race strategies, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text generation, thus failing to meet the primary 'MUST' criteria.
-
----
-
-## [Multi-armed Bandit and Backbone boost Lin-Kernighan-Helsgaun Algorithm
-  for the Traveling Salesman Problems](https://arxiv.org/abs/http://arxiv.org/abs/2501.04072v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04072v1
-
-**Abstract:**
-> The Lin-Kernighan-Helsguan (LKH) heuristic is a classic local search
-> algorithm for the Traveling Salesman Problem (TSP). LKH introduces an
-> $\alpha$-value to replace the traditional distance metric for evaluating the
-> edge quality, which leads to a significant improvement. However, we observe
-> that the $\alpha$-value does not make full use of the historical information
-> during the search, and single guiding information often makes LKH hard to
-> escape from some local optima. To address the above issues, we propose a novel
-> way to extract backbone information during the TSP local search process, which
-> is dynamic and can be updated once a local optimal solution is found. We
-> further propose to combine backbone information, $\alpha$-value, and distance
-> to evaluate the edge quality so as to guide the search. Moreover, we abstract
-> their different combinations to arms in a multi-armed bandit (MAB) and use an
-> MAB model to help the algorithm select an appropriate evaluation metric
-> dynamically. Both the backbone information and MAB can provide diverse guiding
-> information and learn from the search history to suggest the best metric. We
-> apply our methods to LKH and LKH-3, which is an extension version of LKH that
-> can be used to solve about 40 variant problems of TSP and Vehicle Routing
-> Problem (VRP). Extensive experiments show the excellent performance and
-> generalization capability of our proposed method, significantly improving LKH
-> for TSP and LKH-3 for two representative TSP and VRP variants, the Colored TSP
-> (CTSP) and Capacitated VRP with Time Windows (CVRPTW).
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper does not meet the 'MUST' criteria as it focuses on optimizing the Traveling Salesman Problem algorithm using multi-armed bandit and backbone boost, with no primary focus on the engineering, design, or optimization of prompts for Large Language Models (LLMs), nor does it investigate the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Enhancing Distribution and Label Consistency for Graph
-  Out-of-Distribution Generalization](https://arxiv.org/abs/http://arxiv.org/abs/2501.04102v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04102v1
-
-**Abstract:**
-> To deal with distribution shifts in graph data, various graph
-> out-of-distribution (OOD) generalization techniques have been recently
-> proposed. These methods often employ a two-step strategy that first creates
-> augmented environments and subsequently identifies invariant subgraphs to
-> improve generalizability. Nevertheless, this approach could be suboptimal from
-> the perspective of consistency. First, the process of augmenting environments
-> by altering the graphs while preserving labels may lead to graphs that are not
-> realistic or meaningfully related to the origin distribution, thus lacking
-> distribution consistency. Second, the extracted subgraphs are obtained from
-> directly modifying graphs, and may not necessarily maintain a consistent
-> predictive relationship with their labels, thereby impacting label consistency.
-> In response to these challenges, we introduce an innovative approach that aims
-> to enhance these two types of consistency for graph OOD generalization. We
-> propose a modifier to obtain both augmented and invariant graphs in a unified
-> manner. With the augmented graphs, we enrich the training data without
-> compromising the integrity of label-graph relationships. The label consistency
-> enhancement in our framework further preserves the supervision information in
-> the invariant graph. We conduct extensive experiments on real-world datasets to
-> demonstrate the superiority of our framework over other state-of-the-art
-> baselines.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on enhancing distribution and label consistency for graph out-of-distribution generalization, with no apparent connection to Large Language Models (LLMs), prompt engineering, or text-based interactions, thus failing to meet the 'MUST' criteria.
-
----
-
-## [Multimodal Multihop Source Retrieval for Web Question Answering](https://arxiv.org/abs/http://arxiv.org/abs/2501.04173v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04173v1
-
-**Abstract:**
-> This work deals with the challenge of learning and reasoning over multi-modal
-> multi-hop question answering (QA). We propose a graph reasoning network based
-> on the semantic structure of the sentences to learn multi-source reasoning
-> paths and find the supporting facts across both image and text modalities for
-> answering the question. In this paper, we investigate the importance of graph
-> structure for multi-modal multi-hop question answering. Our analysis is
-> centered on WebQA. We construct a strong baseline model, that finds relevant
-> sources using a pairwise classification task. We establish that, with the
-> proper use of feature representations from pre-trained models, graph structure
-> helps in improving multi-modal multi-hop question answering. We point out that
-> both graph structure and adjacency matrix are task-related prior knowledge, and
-> graph structure can be leveraged to improve the retrieval performance for the
-> task. Experiments and visualized analysis demonstrate that message propagation
-> over graph networks or the entire graph structure can replace massive
-> multimodal transformers with token-wise cross-attention. We demonstrated the
-> applicability of our method and show a performance gain of \textbf{4.6$\%$}
-> retrieval F1score over the transformer baselines, despite being a very light
-> model. We further demonstrated the applicability of our model to a large scale
-> retrieval setting.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on multimodal multihop question answering using graph reasoning networks, with emphasis on leveraging image and text modalities, rather than specifically engineering prompts for Large Language Models (LLMs). It does not investigate, analyze, or propose methods for improving LLM performance through the manipulation of textual input prompts as required.
-
----
-
-## [HIVEX: A High-Impact Environment Suite for Multi-Agent Research
-  (extended version)](https://arxiv.org/abs/http://arxiv.org/abs/2501.04180v2)
-**arXiv ID:** http://arxiv.org/abs/2501.04180v2
-
-**Abstract:**
-> Games have been vital test beds for the rapid development of Agent-based
-> research. Remarkable progress has been achieved in the past, but it is unclear
-> if the findings equip for real-world problems. While pressure grows, some of
-> the most critical ecological challenges can find mitigation and prevention
-> solutions through technology and its applications. Most real-world domains
-> include multi-agent scenarios and require machine-machine and human-machine
-> collaboration. Open-source environments have not advanced and are often toy
-> scenarios, too abstract or not suitable for multi-agent research. By mimicking
-> real-world problems and increasing the complexity of environments, we hope to
-> advance state-of-the-art multi-agent research and inspire researchers to work
-> on immediate real-world problems. Here, we present HIVEX, an environment suite
-> to benchmark multi-agent research focusing on ecological challenges. HIVEX
-> includes the following environments: Wind Farm Control, Wildfire Resource
-> Management, Drone-Based Reforestation, Ocean Plastic Collection, and Aerial
-> Wildfire Suppression. We provide environments, training examples, and baselines
-> for the main and sub-tasks. All trained models resulting from the experiments
-> of this work are hosted on Hugging Face. We also provide a leaderboard on
-> Hugging Face and encourage the community to submit models trained on our
-> environment suite.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on multi-agent research, ecological challenges, and the development of an environment suite (HIVEX), with no clear emphasis on prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance.
-
----
-
-## [Fixed Points of Deep Neural Networks: Emergence, Stability, and
-  Applications](https://arxiv.org/abs/http://arxiv.org/abs/2501.04182v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04182v1
-
-**Abstract:**
-> We present numerical and analytical results on the formation and stability of
-> a family of fixed points of deep neural networks (DNNs). Such fixed points
-> appear in a class of DNNs when dimensions of input and output vectors are the
-> same. We demonstrate examples of applications of such networks in supervised,
-> semi-supervised and unsupervised learning such as encoding/decoding of images,
-> restoration of damaged images among others.
-> We present several numerical and analytical results. First, we show that for
-> untrained DNN's with weights and biases initialized by normally distributed
-> random variables the only one fixed point exists. This result holds for DNN
-> with any depth (number of layers) $L$, any layer width $N$, and sigmoid-type
-> activation functions. Second, it has been shown that for a DNN whose parameters
-> (weights and biases) are initialized by ``light-tailed'' distribution of
-> weights (e.g. normal distribution), after training the distribution of these
-> parameters become ``heavy-tailed''. This motivates our study of DNNs with
-> ``heavy-tailed'' initialization. For such DNNs we show numerically %existence
-> and stability that training leads to emergence of $Q(N,L)$ fixed points, where
-> $Q(N,L)$ is a positive integer which depends on the number of layers $L$ and
-> layer width $N$. We further observe numerically that for fixed $N = N_0$ the
-> function $Q(N_0, L)$ is non-monotone, that is it initially grows as $L$
-> increases and then decreases to 1.
-> This non-monotone behavior of $Q(N_0, L)$ is also obtained by analytical
-> derivation of equation for Empirical Spectral Distribution (ESD) of
-> input-output Jacobian followed by numerical solution of this equation.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on the analysis of fixed points in deep neural networks (DNNs), including their emergence, stability, and applications in image processing, but does not address prompt engineering for Large Language Models (LLMs) or the manipulation of textual input prompts to improve LLM performance, thus failing to meet all 'MUST' criteria.
-
----
-
-## [Generative Style Transfer for MRI Image Segmentation: A Case of Glioma
-  Segmentation in Sub-Saharan Africa](https://arxiv.org/abs/http://arxiv.org/abs/2501.04734v1)
-**arXiv ID:** http://arxiv.org/abs/2501.04734v1
-
-**Abstract:**
-> In Sub-Saharan Africa (SSA), the utilization of lower-quality Magnetic
-> Resonance Imaging (MRI) technology raises questions about the applicability of
-> machine learning methods for clinical tasks. This study aims to provide a
-> robust deep learning-based brain tumor segmentation (BraTS) method tailored for
-> the SSA population using a threefold approach. Firstly, the impact of domain
-> shift from the SSA training data on model efficacy was examined, revealing no
-> significant effect. Secondly, a comparative analysis of 3D and 2D
-> full-resolution models using the nnU-Net framework indicates similar
-> performance of both the models trained for 300 epochs achieving a five-fold
-> cross-validation score of 0.93. Lastly, addressing the performance gap observed
-> in SSA validation as opposed to the relatively larger BraTS glioma (GLI)
-> validation set, two strategies are proposed: fine-tuning SSA cases using the
-> GLI+SSA best-pretrained 2D fullres model at 300 epochs, and introducing a novel
-> neural style transfer-based data augmentation technique for the SSA cases. This
-> investigation underscores the potential of enhancing brain tumor prediction
-> within SSA's unique healthcare landscape.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on MRI image segmentation, a non-text generation task, using deep learning methods, and does not investigate, analyze, or propose methods for improving Large Language Model (LLM) performance through prompt engineering, violating multiple 'MUST NOT' criteria.
-
----
-
-## [Retrieval-Augmented Generation by Evidence Retroactivity in LLMs](https://arxiv.org/abs/http://arxiv.org/abs/2501.05475v1)
-**arXiv ID:** http://arxiv.org/abs/2501.05475v1
-
-**Abstract:**
-> Retrieval-augmented generation has gained significant attention due to its
-> ability to integrate relevant external knowledge, enhancing the accuracy and
-> reliability of the LLMs' responses. Most of the existing methods apply a
-> dynamic multiple retrieval-generating process, to address multi-hop complex
-> questions by decomposing them into sub-problems. However, these methods rely on
-> an unidirectional forward reasoning paradigm, where errors from insufficient
-> reasoning steps or inherent flaws in current retrieval systems are
-> irreversible, potentially derailing the entire reasoning chain. For the first
-> time, this work introduces Retroactive Retrieval-Augmented Generation
-> (RetroRAG), a novel framework to build a retroactive reasoning paradigm.
-> RetroRAG revises and updates the evidence, redirecting the reasoning chain to
-> the correct direction. RetroRAG constructs an evidence-collation-discovery
-> framework to search, generate, and refine credible evidence. It synthesizes
-> inferential evidence related to the key entities in the question from the
-> existing source knowledge and formulates search queries to uncover additional
-> information. As new evidence is found, RetroRAG continually updates and
-> organizes this information, enhancing its ability to locate further necessary
-> evidence. Paired with an Answerer to generate and evaluate outputs, RetroRAG is
-> capable of refining its reasoning process iteratively until a reliable answer
-> is obtained. Empirical evaluations show that RetroRAG significantly outperforms
-> existing methods.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on introducing a novel framework for retrieval-augmented generation with a retroactive reasoning paradigm, rather than specifically engineering or optimizing prompts for Large Language Models (LLMs). While LLMs are utilized, the core subject is the enhancement of their reliability through external knowledge integration, not prompt engineering for text-based interactions.
-
----
-
-## [Language and Planning in Robotic Navigation: A Multilingual Evaluation
-  of State-of-the-Art Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.05478v1)
-**arXiv ID:** http://arxiv.org/abs/2501.05478v1
-
-**Abstract:**
-> Large Language Models (LLMs) such as GPT-4, trained on huge amount of
-> datasets spanning multiple domains, exhibit significant reasoning,
-> understanding, and planning capabilities across various tasks. This study
-> presents the first-ever work in Arabic language integration within the
-> Vision-and-Language Navigation (VLN) domain in robotics, an area that has been
-> notably underexplored in existing research. We perform a comprehensive
-> evaluation of state-of-the-art multi-lingual Small Language Models (SLMs),
-> including GPT-4o mini, Llama 3 8B, and Phi-3 medium 14B, alongside the
-> Arabic-centric LLM, Jais. Our approach utilizes the NavGPT framework, a pure
-> LLM-based instruction-following navigation agent, to assess the impact of
-> language on navigation reasoning through zero-shot sequential action prediction
-> using the R2R dataset. Through comprehensive experiments, we demonstrate that
-> our framework is capable of high-level planning for navigation tasks when
-> provided with instructions in both English and Arabic. However, certain models
-> struggled with reasoning and planning in the Arabic language due to inherent
-> limitations in their capabilities, sub-optimal performance, and parsing issues.
-> These findings highlight the importance of enhancing planning and reasoning
-> capabilities in language models for effective navigation, emphasizing this as a
-> key area for further development while also unlocking the potential of
-> Arabic-language models for impactful real-world applications.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on the evaluation of Large Language Models (LLMs) for multilingual robotic navigation, rather than on engineering, design, or optimization of prompts specifically for LLMs. While LLMs are utilized, the core subject is not prompt engineering for text-based interactions, but rather the application of LLMs in navigation tasks across different languages.
-
----
-
-## [Detection, Retrieval, and Explanation Unified: A Violence Detection
-  System Based on Knowledge Graphs and GAT](https://arxiv.org/abs/http://arxiv.org/abs/2501.06224v1)
-**arXiv ID:** http://arxiv.org/abs/2501.06224v1
-
-**Abstract:**
-> Recently, violence detection systems developed using unified multimodal
-> models have achieved significant success and attracted widespread attention.
-> However, most of these systems face two critical challenges: the lack of
-> interpretability as black-box models and limited functionality, offering only
-> classification or retrieval capabilities. To address these challenges, this
-> paper proposes a novel interpretable violence detection system, termed the
-> Three-in-One (TIO) System. The TIO system integrates knowledge graphs (KG) and
-> graph attention networks (GAT) to provide three core functionalities:
-> detection, retrieval, and explanation. Specifically, the system processes each
-> video frame along with text descriptions generated by a large language model
-> (LLM) for videos containing potential violent behavior. It employs ImageBind to
-> generate high-dimensional embeddings for constructing a knowledge graph, uses
-> GAT for reasoning, and applies lightweight time series modules to extract video
-> embedding features. The final step connects a classifier and retriever for
-> multi-functional outputs. The interpretability of KG enables the system to
-> verify the reasoning process behind each output. Additionally, the paper
-> introduces several lightweight methods to reduce the resource consumption of
-> the TIO system and enhance its efficiency. Extensive experiments conducted on
-> the XD-Violence and UCF-Crime datasets validate the effectiveness of the
-> proposed system. A case study further reveals an intriguing phenomenon: as the
-> number of bystanders increases, the occurrence of violent behavior tends to
-> decrease.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper's primary focus is on developing a unified violence detection system using knowledge graphs and GAT, with LLMs being used only as a component for generating text descriptions, rather than the central focus being on prompt engineering for text-based interactions with LLMs.
-
----
-
-## [asanAI: In-Browser, No-Code, Offline-First Machine Learning Toolkit](https://arxiv.org/abs/http://arxiv.org/abs/2501.06226v1)
-**arXiv ID:** http://arxiv.org/abs/2501.06226v1
-
-**Abstract:**
-> Machine learning (ML) has become crucial in modern life, with growing
-> interest from researchers and the public. Despite its potential, a significant
-> entry barrier prevents widespread adoption, making it challenging for
-> non-experts to understand and implement ML techniques. The increasing desire to
-> leverage ML is counterbalanced by its technical complexity, creating a gap
-> between potential and practical application. This work introduces asanAI, an
-> offline-first, open-source, no-code machine learning toolkit designed for users
-> of all skill levels. It allows individuals to design, debug, train, and test ML
-> models directly in a web browser, eliminating the need for software
-> installations and coding. The toolkit runs on any device with a modern web
-> browser, including smartphones, and ensures user privacy through local
-> computations while utilizing WebGL for enhanced GPU performance. Users can
-> quickly experiment with neural networks and train custom models using various
-> data sources, supported by intuitive visualizations of network structures and
-> data flows. asanAI simplifies the teaching of ML concepts in educational
-> settings and is released under an open-source MIT license, encouraging
-> modifications. It also supports exporting models in industry-ready formats,
-> empowering a diverse range of users to effectively learn and apply machine
-> learning in their projects. The proposed toolkit is successfully utilized by
-> researchers of ScaDS.AI to swiftly draft and test machine learning ideas, by
-> trainers to effectively educate enthusiasts, and by teachers to introduce
-> contemporary ML topics in classrooms with minimal effort and high clarity.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a no-code machine learning toolkit (asanAI) that happens to utilize ML techniques, rather than engineering prompts specifically for Large Language Models (LLMs), thus not meeting the core subject criterion.
-
----
-
-## [Balanced Multi-view Clustering](https://arxiv.org/abs/http://arxiv.org/abs/2501.02564v2)
-**arXiv ID:** http://arxiv.org/abs/2501.02564v2
-
-**Abstract:**
-> Multi-view clustering (MvC) aims to integrate information from different
-> views to enhance the capability of the model in capturing the underlying data
-> structures. The widely used joint training paradigm in MvC is potentially not
-> fully leverage the multi-view information, since the imbalanced and
-> under-optimized view-specific features caused by the uniform learning objective
-> for all views. For instance, particular views with more discriminative
-> information could dominate the learning process in the joint training paradigm,
-> leading to other views being under-optimized. To alleviate this issue, we first
-> analyze the imbalanced phenomenon in the joint-training paradigm of multi-view
-> clustering from the perspective of gradient descent for each view-specific
-> feature extractor. Then, we propose a novel balanced multi-view clustering
-> (BMvC) method, which introduces a view-specific contrastive regularization
-> (VCR) to modulate the optimization of each view. Concretely, VCR preserves the
-> sample similarities captured from the joint features and view-specific ones
-> into the clustering distributions corresponding to view-specific features to
-> enhance the learning process of view-specific feature extractors. Additionally,
-> a theoretical analysis is provided to illustrate that VCR adaptively modulates
-> the magnitudes of gradients for updating the parameters of view-specific
-> feature extractors to achieve a balanced multi-view learning procedure. In such
-> a manner, BMvC achieves a better trade-off between the exploitation of
-> view-specific patterns and the exploration of view-invariance patterns to fully
-> learn the multi-view information for the clustering task. Finally, a set of
-> experiments are conducted to verify the superiority of the proposed method
-> compared with state-of-the-art approaches both on eight benchmark MvC datasets
-> and two spatially resolved transcriptomics datasets.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on multi-view clustering, a general machine learning technique, and does not meet the 'MUST' criteria for prompt engineering specifically tailored for Large Language Models (LLMs), nor does it provide examples of textual input prompts impacting LLM output.
-
----
-
-## [Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM
-  Pre-Training Datasets](https://arxiv.org/abs/http://arxiv.org/abs/2501.02628v1)
-**arXiv ID:** http://arxiv.org/abs/2501.02628v1
-
-**Abstract:**
-> A critical part of creating code suggestion systems is the pre-training of
-> Large Language Models on vast amounts of source code and natural language text,
-> often of questionable origin or quality. This may contribute to the presence of
-> bugs and vulnerabilities in code generated by LLMs. While efforts to identify
-> bugs at or after code generation exist, it is preferable to pre-train or
-> fine-tune LLMs on curated, high-quality, and compliant datasets. The need for
-> vast amounts of training data necessitates that such curation be automated,
-> minimizing human intervention.
-> We propose an automated source code autocuration technique that leverages the
-> complete version history of open-source software projects to improve the
-> quality of training data. This approach leverages the version history of all
-> OSS projects to identify training data samples that have been modified or have
-> undergone changes in at least one OSS project, and pinpoint a subset of samples
-> that include fixes for bugs or vulnerabilities. We evaluate this method using
-> The Stack v2 dataset, and find that 17% of the code versions in the dataset
-> have newer versions, with 17% of those representing bug fixes, including 2.36%
-> addressing known CVEs. The deduplicated version of Stack v2 still includes
-> blobs vulnerable to 6,947 known CVEs. Furthermore, 58% of the blobs in the
-> dataset were never modified after creation, suggesting they likely represent
-> software with minimal or no use. Misidentified blob origins present an
-> additional challenge, as they lead to the inclusion of non-permissively
-> licensed code, raising serious compliance concerns.
-> By addressing these issues, the training of new models can avoid perpetuating
-> buggy code patterns or license violations. We expect our results to inspire
-> process improvements for automated data curation, with the potential to enhance
-> the reliability of outputs generated by AI tools.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses primarily on the curation of pre-training datasets for LLMs to reduce bugs and licensing risks, rather than the engineering, design, or optimization of prompts specifically for Large Language Models, failing to meet the first 'MUST' criteria.
-
----
-
-## [Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured
-  State-Space Models](https://arxiv.org/abs/http://arxiv.org/abs/2501.02832v3)
-**arXiv ID:** http://arxiv.org/abs/2501.02832v3
-
-**Abstract:**
-> We propose Samba ASR,the first state of the art Automatic Speech
-> Recognition(ASR)model leveraging the novel Mamba architecture as both encoder
-> and decoder,built on the foundation of state space models(SSMs).Unlike
-> transformerbased ASR models,which rely on self-attention mechanisms to capture
-> dependencies,Samba ASR effectively models both local and global temporal
-> dependencies using efficient statespace dynamics,achieving remarkable
-> performance gains.By addressing the limitations of transformers,such as
-> quadratic scaling with input length and difficulty in handling longrange
-> dependencies,Samba ASR achieves superior accuracy and efficiency.Experimental
-> results demonstrate that Samba ASR surpasses existing opensource
-> transformerbased ASR models across various standard benchmarks,establishing it
-> as the new state of theart in ASR.Extensive evaluations on the benchmark
-> dataset show significant improvements in Word Error Rate(WER),with competitive
-> performance even in lowresource scenarios.Furthermore,the inherent
-> computational efficiency and parameter optimization of the Mamba architecture
-> make Samba ASR a scalable and robust solution for diverse ASR tasks.Our
-> contributions include the development of a new Samba ASR architecture for
-> automatic speech recognition(ASR),demonstrating the superiority of structured
-> statespace models(SSMs)over transformer based models for speech sequence
-> processing.We provide a comprehensive evaluation on public
-> benchmarks,showcasing stateoftheart(SOTA)performance,and present an indepth
-> analysis of computational efficiency,robustness to noise,and sequence
-> generalization.This work highlights the viability of Mamba SSMs as a
-> transformerfree alternative for efficient and accurate ASR.By leveraging the
-> advancements of statespace modeling,Samba ASR redefines ASR performance
-> standards and sets a new benchmark for future research in this field.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper focuses on developing a new architecture (Mamba) for Automatic Speech Recognition (ASR), which is unrelated to Large Language Models (LLMs) and prompt engineering for text-based interactions, thus failing to meet the primary criteria.
-
----
-
-## [A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation
-  Based on Large Language Models Enhanced by Domain Knowledge Retrieval](https://arxiv.org/abs/http://arxiv.org/abs/2501.03295v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03295v2
-
-**Abstract:**
-> Data-driven soft sensors are crucial in predicting key performance indicators
-> in industrial systems. However, current methods predominantly rely on the
-> supervised learning paradigms of parameter updating, which inherently faces
-> challenges such as high development costs, poor robustness, training
-> instability, and lack of interpretability. Recently, large language models
-> (LLMs) have demonstrated significant potential across various domains, notably
-> through In-Context Learning (ICL), which enables high-performance task
-> execution with minimal input-label demonstrations and no prior training. This
-> paper aims to replace supervised learning with the emerging ICL paradigm for
-> soft sensor modeling to address existing challenges and explore new avenues for
-> advancement. To achieve this, we propose a novel framework called the Few-shot
-> Uncertainty-aware and self-Explaining Soft Sensor (LLM-FUESS), which includes
-> the Zero-shot Auxiliary Variable Selector (LLM-ZAVS) and the Uncertainty-aware
-> Few-shot Soft Sensor (LLM-UFSS). The LLM-ZAVS retrieves from the Industrial
-> Knowledge Vector Storage to enhance LLMs' domain-specific knowledge, enabling
-> zero-shot auxiliary variable selection. In the LLM-UFSS, we utilize text-based
-> context demonstrations of structured data to prompt LLMs to execute ICL for
-> predicting and propose a context sample retrieval augmentation strategy to
-> improve performance. Additionally, we explored LLMs' AIGC and probabilistic
-> characteristics to propose self-explanation and uncertainty quantification
-> methods for constructing a trustworthy soft sensor. Extensive experiments
-> demonstrate that our method achieved state-of-the-art predictive performance,
-> strong robustness, and flexibility, effectively mitigates training instability
-> found in traditional methods. To the best of our knowledge, this is the first
-> work to establish soft sensor utilizing LLMs.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on replacing supervised learning with In-Context Learning for soft sensor modeling in industrial systems, utilizing LLMs as a core component within a larger system. While it mentions prompt enhancements (e.g., text-based context demonstrations, context sample retrieval augmentation), prompt engineering for text-based interactions with LLMs is not the central focus, but rather a means to achieve the paper's main objective of advancing soft sensor technology.
-
----
-
-## [FTA-FTL: A Fine-Tuned Aggregation Federated Transfer Learning Scheme for
-  Lithology Microscopic Image Classification](https://arxiv.org/abs/http://arxiv.org/abs/2501.03349v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03349v1
-
-**Abstract:**
-> Lithology discrimination is a crucial activity in characterizing oil
-> reservoirs, and processing lithology microscopic images is an essential
-> technique for investigating fossils and minerals and geological assessment of
-> shale oil exploration. In this way, Deep Learning (DL) technique is a powerful
-> approach for building robust classifier models. However, there is still a
-> considerable challenge to collect and produce a large dataset.
-> Transfer-learning and data augmentation techniques have emerged as popular
-> approaches to tackle this problem. Furthermore, due to different reasons,
-> especially data privacy, individuals, organizations, and industry companies
-> often are not willing to share their sensitive data and information. Federated
-> Learning (FL) has emerged to train a highly accurate central model across
-> multiple decentralized edge servers without transferring sensitive data,
-> preserving sensitive data, and enhancing security. This study involves two
-> phases; the first phase is to conduct Lithology microscopic image
-> classification on a small dataset using transfer learning. In doing so, various
-> pre-trained DL model architectures are comprehensively compared for the
-> classification task. In the second phase, we formulated the classification task
-> to a Federated Transfer Learning (FTL) scheme and proposed a Fine-Tuned
-> Aggregation strategy for Federated Learning (FTA-FTL). In order to perform a
-> comprehensive experimental study, several metrics such as accuracy, f1 score,
-> precision, specificity, sensitivity (recall), and confusion matrix are taken
-> into account. The results are in excellent agreement and confirm the efficiency
-> of the proposed scheme, and show that the proposed FTA-FTL algorithm is capable
-> enough to achieve approximately the same results obtained by the centralized
-> implementation for Lithology microscopic images classification task.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper focuses on Federated Transfer Learning for image classification, primarily dealing with Deep Learning model architectures and data privacy in the context of Lithology microscopic image classification, with no mention of Large Language Models (LLMs) or prompt engineering for text-based interactions."
-}
-
----
-
-## [Not all tokens are created equal: Perplexity Attention Weighted Networks
-  for AI generated text detection](https://arxiv.org/abs/http://arxiv.org/abs/2501.03940v2)
-**arXiv ID:** http://arxiv.org/abs/2501.03940v2
-
-**Abstract:**
-> The rapid advancement in large language models (LLMs) has significantly
-> enhanced their ability to generate coherent and contextually relevant text,
-> raising concerns about the misuse of AI-generated content and making it
-> critical to detect it. However, the task remains challenging, particularly in
-> unseen domains or with unfamiliar LLMs. Leveraging LLM next-token distribution
-> outputs offers a theoretically appealing approach for detection, as they
-> encapsulate insights from the models' extensive pre-training on diverse
-> corpora. Despite its promise, zero-shot methods that attempt to operationalize
-> these outputs have met with limited success. We hypothesize that one of the
-> problems is that they use the mean to aggregate next-token distribution metrics
-> across tokens, when some tokens are naturally easier or harder to predict and
-> should be weighted differently. Based on this idea, we propose the Perplexity
-> Attention Weighted Network (PAWN), which uses the last hidden states of the LLM
-> and positions to weight the sum of a series of features based on metrics from
-> the next-token distribution across the sequence length. Although not zero-shot,
-> our method allows us to cache the last hidden states and next-token
-> distribution metrics on disk, greatly reducing the training resource
-> requirements. PAWN shows competitive and even better performance
-> in-distribution than the strongest baselines (fine-tuned LMs) with a fraction
-> of their trainable parameters. Our model also generalizes better to unseen
-> domains and source models, with smaller variability in the decision boundary
-> across distribution shifts. It is also more robust to adversarial attacks, and
-> if the backbone has multilingual capabilities, it presents decent
-> generalization to languages not seen during supervised training, with LLaMA3-1B
-> reaching a mean macro-averaged F1 score of 81.46% in cross-validation with nine
-> languages.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on developing a novel method (PAWN) for detecting AI-generated text, leveraging LLM outputs, rather than engineering or optimizing prompts for Large Language Models. Prompt manipulation for improving LLM performance is not the central concern.
-
----
-
-## [More is not always better? Enhancing Many-Shot In-Context Learning with
-  Differentiated and Reweighting Objectives](https://arxiv.org/abs/http://arxiv.org/abs/2501.04070v2)
-**arXiv ID:** http://arxiv.org/abs/2501.04070v2
-
-**Abstract:**
-> Large language models (LLMs) excel at few-shot in-context learning (ICL)
-> without requiring parameter updates. However, as the number of ICL
-> demonstrations increases from a few to many, performance tends to plateau and
-> eventually decline. We identify two primary causes for this trend: the
-> suboptimal negative log-likelihood (NLL) optimization objective and the
-> incremental data noise. To address these issues, we introduce DrICL, a novel
-> optimization method that enhances model performance through Differentiated
-> Learning and advantage-based Reweighting objectives. Globally, DrICL utilizes
-> differentiated learning to optimize the NLL objective, ensuring that many-shot
-> performance surpasses zero-shot levels. Locally, it dynamically adjusts the
-> weighting of many-shot demonstrations by leveraging cumulative advantages
-> inspired by reinforcement learning, thereby improving generalization. This
-> approach allows the model to handle varying numbers of shots effectively,
-> mitigating the impact of noisy data. Recognizing the lack of multi-task
-> datasets with diverse many-shot distributions, we develop the Many-Shot ICL
-> Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers
-> from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes.
-> ICL-50 facilitates the evaluation of many-shot ICL strategies across seven
-> prominent NLP tasks and 50 distinct datasets. Experimental results demonstrate
-> that LLMs enhanced with DrICL achieve significant improvements in many-shot
-> setups across various tasks, including both in-domain and out-of-domain
-> scenarios. We release the code and benchmark dataset hoping to facilitate
-> further research in many-shot ICL.
-
-**Decision Explanation:**
-Original response: 
-{
-    "decision": "REJECT",
-    "explanation": "The paper primarily focuses on enhancing many-shot in-context learning through novel optimization methods (Differentiated Learning and Reweighting objectives), rather than specifically on the engineering, design, or optimization of textual input prompts for Large Language Models (LLMs). The core subject is optimization of LLM performance through training methods, not prompt engineering."
-}
-
----
-
-## [Cosmos World Foundation Model Platform for Physical AI](https://arxiv.org/abs/http://arxiv.org/abs/2501.03575v1)
-**arXiv ID:** http://arxiv.org/abs/2501.03575v1
-
-**Abstract:**
-> Physical AI needs to be trained digitally first. It needs a digital twin of
-> itself, the policy model, and a digital twin of the world, the world model. In
-> this paper, we present the Cosmos World Foundation Model Platform to help
-> developers build customized world models for their Physical AI setups. We
-> position a world foundation model as a general-purpose world model that can be
-> fine-tuned into customized world models for downstream applications. Our
-> platform covers a video curation pipeline, pre-trained world foundation models,
-> examples of post-training of pre-trained world foundation models, and video
-> tokenizers. To help Physical AI builders solve the most critical problems of
-> our society, we make our platform open-source and our models open-weight with
-> permissive licenses available via https://github.com/NVIDIA/Cosmos.
-
-**Decision Explanation:**
-Original decision: REJECT
-The paper primarily focuses on a platform for Physical AI and world model development, with emphasis on video curation, model fine-tuning, and open-source resources. It does not meet the core criteria of focusing primarily on prompt engineering for Large Language Models (LLMs), manipulating textual input prompts to improve LLM performance, or providing concrete examples of prompts and their impact on LLM output.
-
----
-