We present MoConVQ, a uniform framework enabling simulated avatars to acquire diverse skills from large, unstructured datasets. Leveraging a rich and scalable discrete skill representation, MoConVQ supports a broad range of applications, including pose estimation, interactive control, text-to-motion generation, and, more interestingly, integrating motion generation with Large Language Models (LLMs).