BERNICE and SAM: Using Language Models to Measure Coherence and Meaning-Preservation in Automatic Text Simplification

Photo by rawpixel on Unsplash

Automatic text simplification (ATS) transforms text such that it becomes easier to read, comprehend, and process. In lexical simplification simpler synonyms are substituted for complicated words, and in syntactic simplification, complex sentences are split into simpler sentences while preserving meaning. ATS is typically a mixture of both.

While ATS has taken great strides in recent years, simplified text does not always preserve the cohesiveness and meaning of the original text. We propose the first metrics to directly measure the preservation of sentence-to-sentence cohesion and meaning during automatic simplification. Our metrics use state-of-the-art transformer technology. BERNICE (BERt Nsp Inference for Cohesion Evaluation) compares sentence-to-sentence cohesion between source and simplified text. SAM (Sentence-level question-Answering as Meaning-preservation metric) measures meaning preservation by comparing simulated reading comprehension on original and simplified text.

We have coded and integrated both metrics with the Easier Automatic Sentence Simplification Evaluation (EASSE) package for convenience. We additionally provide a web app to visualize BERNICE and SAM results. We believe direct measurements of cohesion and meaning preservation will empower researchers to develop improved ATS systems.

Haoyu He - 何灏宇
Haoyu He - 何灏宇
PhD Student

My research interests include natural language understanding, machine translation, conversational agents and casual inference.