Publications

(2023). Tokenization and the Noiseless Channel. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
URL
(2023). On the Efficacy of Sampling Adapters. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
(2023). A Measure-theoretic Characterzation of Tight Language Model. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
URL
(2023). A Formal Perspective on Byte-Pair Encoding. Findings of the Association for Computational Linguistics: ACL 2023.
URL
(2023). On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation. Proceedings of the 11th International Conference on Learning Representations.
URL
(2022). Mutual Information and Hallucinations in Abstractive Summarization. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
(2022). On the probability–quality paradox in language generation generation. ACL.
PDF
(2022). Estimating the Entropy of Linguistic Distributions. ACL.
PDF
(2022). Analyzing Wrap-Up Effects through an Information-Theoretic Lens. ACL.
PDF
(2022). On Decoding Strategies for Neural Text Generators. Transactions of the Association for Computational Linguistics.