Clara Meister
Clara Meister
Home
Publications
Light
Dark
Automatic
1
A Formal Perspective on Byte-Pair Encoding
Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data in NLP, despite being devised initially as a compression …
Vilém Zouhar
,
Clara Meister
,
Juan Gastaldi
,
Li Du
,
Tim Vieira
,
Mrinmaya Sachan
,
Ryan Cotterell
Cite
URL
A Measure-theoretic Characterzation of Tight Language Model
Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most …
Li Du
,
Lucas Torroba Hennigen
,
Tiago Pimentel
,
Clara Meister
,
Jason Eisner
,
Ryan Cotterell
Cite
URL
On the Efficacy of Sampling Adapters
Sampling-based decoding strategies are widely employed for generating text from probabilistic models, yet standard ancestral sampling …
Clara Meister
,
Tiago Pimentel
,
Luca Malagutti
,
Ryan Cotterell
Cite
Tokenization and the Noiseless Channel
Subword tokenization is a key part of most NLP pipelines.However, little is known about why some tokenizer and hyperparameter …
Vilém Zouhar
,
Clara Meister
,
Juan Gastaldi
,
Li Du
,
Mrinmaya Sachan
,
Ryan Cotterell
Cite
URL
On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation
A good automatic evaluation metric for language generation ideally correlates highly with human judgements of text quality. Yet, there …
Tiago Pimentel
,
Clara Meister
,
Ryan Cotterell
Cite
URL
Mutual Information and Hallucinations in Abstractive Summarization
Despite significant progress in the quality of language generated from abstractive summarization models, these models still exhibit the …
Liam van der Poel
,
Ryan Cotterell
,
Clara Meister
PDF
Cite
URL
Analyzing Wrap-Up Effects through an Information-Theoretic Lens
Clara Meister
,
Tiago Pimentel
,
Thomas Hikaru Clark
,
Ryan Cotterell
,
Roger P. Levy
PDF
Cite
Estimating the Entropy of Linguistic Distributions
Aryaman Arora
,
Clara Isabel Meister
,
Ryan Cotterell
PDF
Cite
On the probability–quality paradox in language generation generation
Clara Isabel Meister
,
Gian Wiher
,
Tiago Pimentel
,
Ryan Cotterell
PDF
Cite
Cluster-based Evaluation of Automatically Generated Text
While probabilistic language generators have improved dramatically over the last few years, the automatic evaluation metrics used to …
Tiago Pimentel\*
,
Clara Meister\*
,
Ryan Cotterell
PDF
Cite
URL
»
Cite
×