Eino: Embedding guide
Basic Introduction
The Embedding component is used to convert text into vector representations. Its main function is to map text content into a high-dimensional vector space, so that semantically similar texts are closer in the vector space. This component plays an important role in the following scenarios:
- Text similarity calculation
- Semantic search
- Text clustering analysis
Component Definition
Interface Definition
type Embedder interface {
EmbedStrings(ctx context.Context, texts []string, opts ...Option) ([][]float64, error)
}
EmbedStrings Method
- Function: Convert a set of texts into vector representations
- Parameters:
- ctx: Context object, used to pass request-level information, and also for passing Callback Manager
- texts: List of texts to be converted
- opts: Conversion options, used to configure the conversion behavior
- Return values:
[][]float64
: List of vector representations corresponding to the texts, the dimension of each vector is determined by the specific implementation- error: Error information during the conversion process
Common Option
The Embedding component uses EmbeddingOption to define optional parameters. Below are the abstract common options. Each specific implementation can define its specific options, which can be wrapped into a unified EmbeddingOption type through the WrapEmbeddingImplSpecificOptFn function.
type Options struct {
// Model is the name of the model used to generate vectors
Model *string
}
Options can be set as follows:
// Set model name
WithModel(model string) Option
Usage
Standalone Usage
Code location: eino-ext/components/embedding/openai/examples/embedding
import "github.com/cloudwego/eino-ext/components/embedding/openai"
embedder, _ := openai.NewEmbedder(ctx, &openai.EmbeddingConfig{
APIKey: accessKey,
Model: "text-embedding-3-large",
Dimensions: &defaultDim,
Timeout: 0,
})
vectorIDs, _ := embedder.EmbedStrings(ctx, []string{"hello", "how are you"})
Usage in Orchestration
Code location: eino-ext/components/embedding/openai/examples/embedding
// Use in Chain
chain := compose.NewChain[[]string, [][]float64]()
chain.AppendEmbedding(embedder)
// Use in Graph
graph := compose.NewGraph[[]string, [][]float64]()
graph.AddEmbeddingNode("embedding_node", embedder)
Option and Callback Usage
Option Usage Example
// Use options (example for standalone usage)
vectors, err := embedder.EmbedStrings(ctx, texts,
embedding.WithModel("text-embedding-3-small"),
)
Callback Usage Example
Code location: eino-ext/components/embedding/openai/examples/embedding
import (
"github.com/cloudwego/eino/callbacks"
"github.com/cloudwego/eino/components/embedding"
"github.com/cloudwego/eino/compose"
callbacksHelper "github.com/cloudwego/eino/utils/callbacks"
"github.com/cloudwego/eino-ext/components/embedding/openai"
)
handler := &callbacksHelper.EmbeddingCallbackHandler{
OnStart: func(ctx context.Context, runInfo *callbacks.RunInfo, input *embedding.CallbackInput) context.Context {
log.Printf("input access, len: %v, content: %s\n", len(input.Texts), input.Texts)
return ctx
},
OnEnd: func(ctx context.Context, runInfo *callbacks.RunInfo, output *embedding.CallbackOutput) context.Context {
log.Printf("output finished, len: %v\n", len(output.Embeddings))
return ctx
},
}
callbackHandler := callbacksHelper.NewHandlerHelper().Embedding(handler).Handler()
chain := compose.NewChain[[]string, [][]float64]()
chain.AppendEmbedding(embedder)
// Compile and run
runnable, _ := chain.Compile(ctx)
vectors, _ = runnable.Invoke(ctx, []string{"hello", "how are you"},
compose.WithCallbacks(callbackHandler))
log.Printf("vectors in chain: %v", vectors)
Existing Implementations
- OpenAI Embedding: Generate vectors using OpenAI’s text embedding model Embedding - OpenAI
- ARK Embedding: Generate vectors using the ARK platform’s model Embedding - ARK
Custom Implementation Reference
When implementing a custom Embedding component, the following points need to be noted:
- Pay attention to handling common options
- Implement the callback mechanism properly
Option Mechanism
Custom Embedding needs to implement its own Option mechanism:
// Define the Option struct
type MyEmbeddingOptions struct {
BatchSize int
MaxRetries int
Timeout time.Duration
}
// Define the Option function
func WithBatchSize(size int) embedding.Option {
return embedding.WrapEmbeddingImplSpecificOptFn(func(o *MyEmbeddingOptions) {
o.BatchSize = size
})
}
Callback Handling
The Embedder implementation needs to trigger callbacks at appropriate times. The framework has defined standard callback input and output structs:
// CallbackInput is the input for the embedding callback
type CallbackInput struct {
// Texts are the list of texts to be converted
Texts []string
// Config is the configuration information for generating vectors
Config *Config
// Extra is additional information for the callback
Extra map[string]any
}
// CallbackOutput is the output for the embedding callback
type CallbackOutput struct {
// Embeddings are the list of generated vectors
Embeddings [][]float64
// Config is the configuration information for generating vectors
Config *Config
// TokenUsage is the token usage information
TokenUsage *TokenUsage
// Extra is additional information for the callback
Extra map[string]any
}
// TokenUsage is the token usage information
type TokenUsage struct {
// PromptTokens is the number of tokens in the prompt
PromptTokens int
// CompletionTokens is the number of tokens in the completion
CompletionTokens int
// TotalTokens is the total number of tokens
TotalTokens int
}
Complete Implementation Example
type MyEmbedder struct {
model string
batchSize int
}
func NewMyEmbedder(config *MyEmbedderConfig) (*MyEmbedder, error) {
return &MyEmbedder{
model: config.DefaultModel,
batchSize: config.DefaultBatchSize,
}, nil
}
func (e *MyEmbedder) EmbedStrings(ctx context.Context, texts []string, opts ...embedding.Option) ([][]float64, error) {
// 1. Handle the options
options := &MyEmbeddingOptions{
Options: &embedding.Options{},
BatchSize: e.batchSize,
}
options.Options = embedding.GetCommonOptions(options.Options, opts...)
options = embedding.GetImplSpecificOptions(options.Options, opts...)
// 2. Get the callback manager
cm := callbacks.ManagerFromContext(ctx)
// 3. Trigger the pre-generation callback
ctx = cm.OnStart(ctx, info, &embedding.CallbackInput{
Texts: texts,
Config: &embedding.Config{
Model: e.model,
},
})
// 4. Execute the vector generation logic
vectors, tokenUsage, err := e.doEmbed(ctx, texts, options)
// 5. Handle errors and trigger the completion callback
if err != nil {
ctx = cm.OnError(ctx, info, err)
return nil, err
}
ctx = cm.OnEnd(ctx, info, &embedding.CallbackOutput{
Embeddings: vectors,
Config: &embedding.Config{
Model: e.model,
},
TokenUsage: tokenUsage,
})
return vectors, nil
}
func (e *MyEmbedder) doEmbed(ctx context.Context, texts []string, opts *MyEmbeddingOptions) ([][]float64, *TokenUsage, error) {
// Implementation logic
return vectors, tokenUsage, nil
}