PGM: Partition Generative Modeling
Reference: arXiv:2505.18883
PGM combines the strengths of autoregressive and masked generative models by partitioning tokens into two groups and using sparse attention to block information flow between them. This allows the model to process previously generated tokens only during sampling while retaining parallel and any-order generation capabilities, leading to significant improvements in sampling latency and throughput.
Usage
Train on OpenWebText: