top of page

Large Language Model Alignment

Language models do not always give answers that match what people expect. This research teaches models to respond in safer and more helpful ways using feedback from people. We also design simple ways to test how well the models are working and make the training process easier.

Pre-trained language models (PrLMs) trained via contrastive learning methods achieved state-of-the-art performance on various natural language processing (NLP) tasks. Most PrLMs for sentence embedding focuses on context similarity as an objective function of contrastive learning. However, we found that these PrLMs, including recently released large language models (LLMs) like LLaMA,2 underperform when analyzing syntax information on probing tasks. This limitation becomes particularly noticeable in applications that depend on nuanced sentence understanding, such as the Retrieval Augmented Generation (RAG) framework in LLMs. This paper introduces a new sentence embedding model named SynCSE: Syntax Graph-based Contrastive Learning of Sentence Embeddings. Our approach enables meaningful sentence embeddings of language models through learning the syntactic features. To accomplish this, we train a PrLM with graph neural networks (GNNs) receiving a directed syntax graph. We then detach additional GNN layers from PrLM for inference; which does not require a syntax graph. The proposed model gains improvement on baselines in sentence textual similarity (STS) tasks, transfer tasks, and especially probing tasks. Additionally, we observe that our model has improved alignment and competitive uniformity compared to the baseline.

Small language models (SLMs) are increasingly utilized for on-device applications due to their ability to ensure user privacy, reduce inference latency, and operate independently of cloud infrastructure. However, their performance is often limited when processing complex data structures such as graphs, which are ubiquitous in real-world datasets like social networks and system interactions. Graphs inherently encode intricate structural dependencies, requiring models to effectively capture both local and global relationships. Traditional language models, designed primarily for text data, struggle to address these requirements, leading to suboptimal performance in graph-related tasks. To overcome this limitation, we propose a novel graph encoder-based prompt tuning framework which integrates a graph convolutional network (GCN) with a graph transformer. By leveraging the complementary strengths of the GCN for local structural modeling and the graph transformer for capturing global relationships, our method enables SLMs to effectively process graph data. This integration significantly enhances the ability of SLMs to handle graph-centric tasks while maintaining the efficiency required for resource-constrained devices. The experimental results show that our approach not only improves the performance of SLMs on various graph benchmarks but also achieves results which closely approach the performance of a large language model (LLM). This work highlights the potential of extending SLMs for graph-based applications and advancing the capabilities of on-device artificial intelligence.​

Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.

Keywords: 

​

Unlike previous dialogue-based question-answering (QA) datasets, DREAM, multiple-choice Dialogue-based REAding comprehension exaMination dataset, requires a deep understanding of dialogue. Many problems require multi-sentence reasoning, whereas some require commonsense reasoning. However, most pre-trained language models (PTLMs) do not consider commonsense. In addition, because the maximum number of tokens that a language model (LM) can deal with is limited, the entire dialogue history cannot be included. The resulting information loss has an adverse effect on performance. To address these problems, we propose a Dialogue-based QA model with Common-sense Reasoning (DQACR), a language model that exploits Semantic Search and continual learning. We used Semantic Search to complement information loss from truncated dialogue. In addition, we used Semantic Search and continual learning to improve the PTLM’s commonsense reasoning. Our model achieves an improvement of approximately 1.5% over the baseline method and can thus facilitate QA-related tasks. It contributes toward not only dialogue-based QA tasks but also another form of QA datasets for future tasks.

Generative commonsense reasoning refers to the ability of a language model to generate a sentence with a given concept-set based on compositional generalization and commonsense reasoning. In the CommonGen challenge, which evaluates the capability of generative commonsense reasoning, language models continue to exhibit low performances and struggle to leverage knowledge representation from humans. Therefore, we propose PU-GEN to leverage human-centered knowledge in language models to enhance compositional generalization and commonsense reasoning considering the human language generation process. To incorporate human-centered knowledge, PU-GEN reinterprets two linguistic philosophies from Wittgenstein: picture theory and use theory. First, we retrieve scene knowledge to reflect picture theory such that a model can describe a general situation as if it were being painted. Second, we extend relational knowledge to consider use theory for understanding various contexts. PU-GEN demonstrates superior performance in qualitative and quantitative evaluations over baseline models in CommonGen and generates convincing evidence for CommonsenseQA. Moreover, it outperforms the state-of-the-art model used in the previous CommonGen challenge.

Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment.

In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by formulating the tasks as dialog--response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in this manner tend to make predictions based on the relatedness of history and candidates, ignoring the sequential nature of multi-turn dialog systems. This suggests that the response selection task alone is insufficient for learning temporal dependencies between utterances. To this end, we propose utterance manipulation strategies (UMS) to address this problem. Specifically, UMS consist of several strategies (i.e., insertion, deletion, and search), which aid the response selection model towards maintaining dialog coherence. Further, UMS are self-supervised methods that do not require additional annotation and thus can be easily incorporated into existing approaches. Extensive evaluation across multiple languages and models shows that UMS are highly effective in teaching dialog consistency, which leads to models pushing the state-of-the-art with significant margins on multiple public benchmark datasets.

We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous baselines of each task, it still has limitations if a task corpus is too focused on a certain domain. Post-training on domain-specific corpus (e.g., Ubuntu Corpus) helps the model to train contextualized representations and words that do not appear in general corpus (e.g., English Wikipedia). Experimental results show that our approach achieves new state-of-the-art on two response selection benchmarks (i.e., Ubuntu Corpus V1, Advising Corpus) performance improvement by 5.9% and 6% on R@1.

Keywords: Response selection, Human computer dialog system, Spoken language processing

​​

bottom of page