Running an experiment to see if gemini-embeddings-2, which is a multimodal embeddings model, is a viable substitute for OpenAI's text-embedding-3-small.
- The gemini model accepts specifications for the purpose of the embedding: retrieval (query or document), or semantic similarity. Would be interesting to understand how exactly this modifies the vector
- Embedding text fragments in the retrieval mode gives me specific, rigid 'suggestions' (i.e., related/proximal fragments); these turn out to be less interesting than the OpenAI ones. The latter seem to have captured gestalt, or vibe better.
A possible challenge is that I need these embeddings to be both retrieval-optimised (for search), and semantic-similarity-optimised (for traversal, thread-ing, and more creative tasks).
Testing different categories of fragment
- Literary (The Tartar Steppe)
- OpenAI related two fragments on the shortness of life that Gemini did not Gemini had less intra-artefact suggestions; it related across texts more
- Gemini had more literary relations: more from Calvino, Adams, Dostoyevsky
- Non-fiction Essay (Sam Kriss on Sudan)
- For a fragment about the ethnic makeups of the different sides of the conflict OpenAI suggested multiple fragments about various Turkic ethnicities. Gemini kept it about Sudan, and conflict in Africa.
- Also had some obviously better results on herders in Sudan; related this to herder fragments in Chaffetz, but not recklessly.
- A fragment on Velcheru Narayana Rao
- Gemini suggested fragments from a text by Rao; OpenAI didn't. This seems conclusive!
Thread building
- A thread on 'language as prescriptive'
- Gemini's results were narrower and deeper: a lot more from the same source/domain
- Dropping the thresholds to standardise them helped with this, but the Gemini thread was still narrower
- The thread was a 'theme' thread though, so this might even be desirable
- Gemini's results were narrower and deeper: a lot more from the same source/domain
- An exploratory thread on the value of style in literature and art
- OpenAI's was more 'exploratory' and interesting/made more cross-domain connections
Strange situation here: whatever helps with search precision works against Pond's value as a discovery tool.
Have just come to the conclusion that I don't need a multimodal model; describing an image that can then be embedded with the rest is the ideal approach: pond is concerned more with the conceptual content of an image, which can be captured in text