Large Language Model Alignment

Language models do not always give answers that match what people expect. This research teaches models to respond in safer and more helpful ways using feedback from people. We also design simple ways to test how well the models are working and make the training process easier.

Pre-trained language models (PrLMs) trained via contrastive learning methods achieved state-of-the-art performance on various natural language processing (NLP) tasks. Most PrLMs for sentence embedding focuses on context similarity as an objective function of contrastive learning. However, we found that these PrLMs, including recently released large language models (LLMs) like LLaMA,2 underperform when analyzing syntax information on probing tasks. This limitation becomes particularly noticeable in applications that depend on nuanced sentence understanding, such as the Retrieval Augmented Generation (RAG) framework in LLMs. This paper introduces a new sentence embedding model named SynCSE: Syntax Graph-based Contrastive Learning of Sentence Embeddings. Our approach enables meaningful sentence embeddings of language models through learning the syntactic features. To accomplish this, we train a PrLM with graph neural networks (GNNs) receiving a directed syntax graph. We then detach additional GNN layers from PrLM for inference; which does not require a syntax graph. The proposed model gains improvement on baselines in sentence textual similarity (STS) tasks, transfer tasks, and especially probing tasks. Additionally, we observe that our model has improved alignment and competitive uniformity compared to the baseline.

Keywords: Dependency parser, Sentence embeddings, Pre-trained language models, Graph encoder, Contrastive learning

Enhancing Small Language Models for Graph Tasks Through Graph Encoder Integration

Dongryul Oh, Sujin Kang, Heejin Kim, and Dongsuk Oh

Applied Sciences 15.5 (2025): 919–933

Small language models (SLMs) are increasingly utilized for on-device applications due to their ability to ensure user privacy, reduce inference latency, and operate independently of cloud infrastructure. However, their performance is often limited when processing complex data structures such as graphs, which are ubiquitous in real-world datasets like social networks and system interactions. Graphs inherently encode intricate structural dependencies, requiring models to effectively capture both local and global relationships. Traditional language models, designed primarily for text data, struggle to address these requirements, leading to suboptimal performance in graph-related tasks. To overcome this limitation, we propose a novel graph encoder-based prompt tuning framework which integrates a graph convolutional network (GCN) with a graph transformer. By leveraging the complementary strengths of the GCN for local structural modeling and the graph transformer for capturing global relationships, our method enables SLMs to effectively process graph data. This integration significantly enhances the ability of SLMs to handle graph-centric tasks while maintaining the efficiency required for resource-constrained devices. The experimental results show that our approach not only improves the performance of SLMs on various graph benchmarks but also achieves results which closely approach the performance of a large language model (LLM). This work highlights the potential of extending SLMs for graph-based applications and advancing the capabilities of on-device artificial intelligence.

Keywords: small language model (SLM), on-device AI, graph neural network (GNN), graph transformer, graph convolutional network (GCN), prompt tuning, graph representation learning

Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

Dongsuk Oh, Yejin Kim, Hodong Lee, H. Howie Huang, and Heuiseok Lim

Proceedings of COLING 2022 (2022): 4585–4592

Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.

Keywords:

Considering Commonsense in Solving QA: Reading Comprehension with Semantic Search and Continual Learning

Jeong, Seungwon, Dongsuk Oh, Kinam Park, and Heuiseok Lim

Applied Sciences 12.9 (2022):

Unlike previous dialogue-based question-answering (QA) datasets, DREAM, multiple-choice Dialogue-based REAding comprehension exaMination dataset, requires a deep understanding of dialogue. Many problems require multi-sentence reasoning, whereas some require commonsense reasoning. However, most pre-trained language models (PTLMs) do not consider commonsense. In addition, because the maximum number of tokens that a language model (LM) can deal with is limited, the entire dialogue history cannot be included. The resulting information loss has an adverse effect on performance. To address these problems, we propose a Dialogue-based QA model with Common-sense Reasoning (DQACR), a language model that exploits Semantic Search and continual learning. We used Semantic Search to complement information loss from truncated dialogue. In addition, we used Semantic Search and continual learning to improve the PTLM’s commonsense reasoning. Our model achieves an improvement of approximately 1.5% over the baseline method and can thus facilitate QA-related tasks. It contributes toward not only dialogue-based QA tasks but also another form of QA datasets for future tasks.

Keywords: dialogue-based multiple-choice QA, commonsense reasoning, semantic search, pre-trained language models, deep learning

PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge

Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, Chanjun Park, Kisu Yang, Hyeonseok Moon, Kinam Park, and Heuiseok Lim

Knowledge-Based Systems 256 (2022):

Generative commonsense reasoning refers to the ability of a language model to generate a sentence with a given concept-set based on compositional generalization and commonsense reasoning. In the CommonGen challenge, which evaluates the capability of generative commonsense reasoning, language models continue to exhibit low performances and struggle to leverage knowledge representation from humans. Therefore, we propose PU-GEN to leverage human-centered knowledge in language models to enhance compositional generalization and commonsense reasoning considering the human language generation process. To incorporate human-centered knowledge, PU-GEN reinterprets two linguistic philosophies from Wittgenstein: picture theory and use theory. First, we retrieve scene knowledge to reflect picture theory such that a model can describe a general situation as if it were being painted. Second, we extend relational knowledge to consider use theory for understanding various contexts. PU-GEN demonstrates superior performance in qualitative and quantitative evaluations over baseline models in CommonGen and generates convincing evidence for CommonsenseQA. Moreover, it outperforms the state-of-the-art model used in the previous CommonGen challenge.

Keywords: Text generation, Commonsense reasoning, Human-centered knowledge, Language model

Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge

Jang, Yoonna, Jungwoo Lim, Yuna Hur, Dongsuk Oh, Suhyune Son, Yeonsoo Lee, Donghoon Shin, Seungryong Kim, and Heuiseok Lim

Proceedings of the AAAI Conference on Artificial Intelligence 36. 10 (2022): 10803-10812

Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment.

Keywords: Speech & Natural Language Processing (SNLP)

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Whang, Taesun, Dongyub Lee, Dongsuk Oh, Chanhee Lee, Kijong Han, Dong-hun Lee, and Saebyeok Lee

Proceedings of the AAAI Conference on Artificial Intelligence 35.16 (2021): 14041–14049

In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by formulating the tasks as dialog--response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in this manner tend to make predictions based on the relatedness of history and candidates, ignoring the sequential nature of multi-turn dialog systems. This suggests that the response selection task alone is insufficient for learning temporal dependencies between utterances. To this end, we propose utterance manipulation strategies (UMS) to address this problem. Specifically, UMS consist of several strategies (i.e., insertion, deletion, and search), which aid the response selection model towards maintaining dialog coherence. Further, UMS are self-supervised methods that do not require additional annotation and thus can be easily incorporated into existing approaches. Extensive evaluation across multiple languages and models shows that UMS are highly effective in teaching dialog consistency, which leads to models pushing the state-of-the-art with significant margins on multiple public benchmark datasets.

Keywords: Conversational AI/Dialog Systems

An Effective Domain Adaptive Post-Training Method for BERT in Response Selection

Whang, Taesun, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, and Heuiseok Lim

Proceedings of Interspeec (2019): 1585-1589

We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous baselines of each task, it still has limitations if a task corpus is too focused on a certain domain. Post-training on domain-specific corpus (e.g., Ubuntu Corpus) helps the model to train contextualized representations and words that do not appear in general corpus (e.g., English Wikipedia). Experimental results show that our approach achieves new state-of-the-art on two response selection benchmarks (i.e., Ubuntu Corpus V1, Advising Corpus) performance improvement by 5.9% and 6% on R@1.

Keywords: Response selection, Human computer dialog system, Spoken language processing