New ask Hacker News story: Ask HN: Any truly multi-modal transformer architectures?

Ask HN: Any truly multi-modal transformer architectures?
3 by prats226 | 1 comments on Hacker News.
Most of the multi-modal architectures consume images as tokens in same dimension. Any architectures which look at text and images as first class citizens and also produce image tokens interleaved with text?

Comments

Popular posts from this blog

How can Utilize Call Center Outsourcing for Increase your Business Income well?