Embedding lookup retrieved from the language model.

This workflow illustrates the process of transforming textual data into vector representations using a pre-trained language model. The leftmost column contains the original textual inputs, including both species descriptions and user queries. These texts are associated with unique IDs (middle column) for reference. The retrieved vector representations (rightmost column) are numerical embeddings generated by the language model. Each row represents a unique vector, which captures the semantic meaning of the corresponding text. The numbers within each vector represent the values of individual dimensions in the vector space. These values are used for calculating the cosine similarity between the vectors.

 
  Part of: Kao D-K, Yang C-K, Chen C-H (2024) Enhancing Plant Species Retrieval in Flora Through Language Model Integration. Biodiversity Information Science and Standards 8: e142132. https://doi.org/10.3897/biss.8.142132