New ask Hacker News story: Ask HN: Are there any movements to create the equivalent of a sitemap for LLM's?
Ask HN: Are there any movements to create the equivalent of a sitemap for LLM's?
2 by thomasfromcdnjs | 0 comments on Hacker News.
I've been working on an indigenous language project for a while now -> https://ift.tt/o20Tt8Z It currently fits inside the token limit of gpt-4o (120k tokens), so I am able to prompt inject it and make a translator-like bot that has amazing results -> https://ift.tt/filLGHj (this is above and beyond good enough for the project goals) The problem being that per translation I am paying for 100k+ tokens. Other than the Github YAML version of the dict, I have a publicly indexed (Google) html version of it -> https://ift.tt/OHpQI5M But obviously if the model's had trained on this data, it would already have an intrinsic knowledge of it and I would have to prompt inject a lot less. I know that models have their training cut offs every iteration, but is there a way to ensure that you are crawled during the next. (I'm talking in the context of OpenAI but curious to other answers for any other models) Essentially are there an equivalents of Google Webmaster Tools where I can submit a sitemap, check the progress of crawls or submit individual pages? If there isn't, are there any movements to create such a resource?
2 by thomasfromcdnjs | 0 comments on Hacker News.
I've been working on an indigenous language project for a while now -> https://ift.tt/o20Tt8Z It currently fits inside the token limit of gpt-4o (120k tokens), so I am able to prompt inject it and make a translator-like bot that has amazing results -> https://ift.tt/filLGHj (this is above and beyond good enough for the project goals) The problem being that per translation I am paying for 100k+ tokens. Other than the Github YAML version of the dict, I have a publicly indexed (Google) html version of it -> https://ift.tt/OHpQI5M But obviously if the model's had trained on this data, it would already have an intrinsic knowledge of it and I would have to prompt inject a lot less. I know that models have their training cut offs every iteration, but is there a way to ensure that you are crawled during the next. (I'm talking in the context of OpenAI but curious to other answers for any other models) Essentially are there an equivalents of Google Webmaster Tools where I can submit a sitemap, check the progress of crawls or submit individual pages? If there isn't, are there any movements to create such a resource?
Comments
Post a Comment