Hushh Vibe Catalog 
Hushh Catalogs Files (hcf) are serialized formats intended to provide rapid loading of embedding data for the purposes of vector search. More information on the Hushh Catalog Format is available here
Installation
You can install the library from PyPI:
-m pip install hushh-vibe-catalog python
And the development version from GitHub with:
-m pip install git+https://github.com/hushh-labs/hushh-labs-catalog python
Example
Creating an HCF file
from pathlib import Path
= Catalog("demo_catalog")
cat = Brand("dummy", "description", "dummy_url")
dummy_brand for filename in ["assets/bird.jpg", "assets/dog.jpg", "assets/cat.jpg"]
= Path(filenam).stem
name else:
= Product(description=name, url="dummy_url", image_path=filename, brand = dummy_brand)
prod
cat.addProduct(prod)
print("Writing Catalog")
"catalog.hcf") cat.to_hcf(
Reading an HCF file
= catalog.read_hcf("catalog.hcf")
cat = cat.ProductVibes()
pv = []
all_embeddings
= []
embeddings
for idx in range(0,pv.ProductTextBatchesLength()):
= pv.ProductTextBatches(idx)
batch = batch.FlatTensorAsNumpy()
embs = embs.reshape(batch.ShapeAsNumpy())
embs
embeddings.append(embs)
= np.concatenate(embeddings)
embeddings
= "dog"
query
= tokenizer([query], padding=True, return_tensors="pt")
inputs = model.get_text_features(**inputs)
query_emb = util.semantic_search(query_emb, embeddings, top_k=3)[0]
hits return cat.ProductVibes().Products(hits[0]['corpus_id']).Description()
Latest Schema
The latest version of the schema is as follows.
namespace hushh.hcf;
table Brand {
id: string;
description: string;
name: string;
url: string;
}
table Product {
id: string;
description: string;
url: string;
brand: Brand;
}
table Vibe {
id: string;
description: string;
product_idx: [int];
}
table Category {
id: string;
description: string;
url: string;
product_idx: [int];
}
enum VibeMode : byte { ProductText = 0, ProductImage, Text, Image, Category}
table FlatEmbeddingBatch {
id: string;
shape:[int];
vibe_mode: VibeMode;
flat_tensor:[float];
product_idx: [int];
}
table ProductVibes {
id: string;
products: [Product];
brands: [Brand];
categories: [Category];
vibes: [Vibe];
product_text_batches: [FlatEmbeddingBatch];
product_image_batches: [FlatEmbeddingBatch];
text_batches: [FlatEmbeddingBatch];
image_batches: [FlatEmbeddingBatch];
}
table Catalog {
id : string;
version: string;
description: string;
product_vibes: ProductVibes;
batch_size: int;
tokenizer_name_or_path: string;
model_name_or_path: string;
}
root_type Catalog;
Contributing
This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
- If you think you have encountered a bug, please submit an issue.