New ask Hacker News story: Ask HN: Best foundation model for CLM fine-tuning?
Ask HN: Best foundation model for CLM fine-tuning? 8 by philomath868 | 0 comments on Hacker News. Hi, I have a largish (2 GB) corpus of curated, high-quality text in some low-resource language, and I want to build a model that would provide an advanced "auto complete" service for writers. I'm thinking of taking a decoder-only model such as Llama, Mistral or Gemma, slice off the embedding layers (which are based on unneeded languages), create new ones (perhaps initialized based on a FastText model trained on the corpus), paired with a tokenizer newly created from my corpus, then train the model on my corpus until convergence. Additional potential details include: a custom loss function for synonym-aware training (based on a custom high-quality thesaurus), where synonyms of the "correct" word are somewhat rewarded; POS-tagging the corpus with a Language-specific POS-tagger, and add a POS-tagging head to the model as a Multi-task Learning, to force grammatical g...