Hushh Vibe Catalog

Build

Hushh Catalogs Files (hcf) are serialized formats intended to provide rapid loading of embedding data for the purposes of vector search. More information on the Hushh Catalog Format is available here

Installation

You can install the library from PyPI:

python -m pip install hushh-vibe-catalog

And the development version from GitHub with:

python -m pip install git+https://github.com/hushh-labs/hushh-labs-catalog

Example

Creating an HCF file

from pathlib import Path
cat = Catalog("demo_catalog")
dummy_brand = Brand("dummy", "description", "dummy_url")
for filename in ["assets/bird.jpg", "assets/dog.jpg", "assets/cat.jpg"]
    name = Path(filenam).stem
    else:
        prod = Product(description=name, url="dummy_url", image_path=filename, brand = dummy_brand)
        cat.addProduct(prod)

print("Writing Catalog")
cat.to_hcf("catalog.hcf")

Reading an HCF file

cat = catalog.read_hcf("catalog.hcf")
pv = cat.ProductVibes()
all_embeddings = []

embeddings = []

for idx in range(0,pv.ProductTextBatchesLength()):
    batch = pv.ProductTextBatches(idx)
    embs = batch.FlatTensorAsNumpy()
    embs = embs.reshape(batch.ShapeAsNumpy())
    embeddings.append(embs)

embeddings = np.concatenate(embeddings)


query = "dog"

inputs = tokenizer([query], padding=True, return_tensors="pt")
query_emb = model.get_text_features(**inputs)
hits = util.semantic_search(query_emb, embeddings, top_k=3)[0]
return cat.ProductVibes().Products(hits[0]['corpus_id']).Description()

Latest Schema

The latest version of the schema is as follows.

namespace hushh.hcf;

table Brand {
  id: string;
  description: string;
  name: string;
  url: string;
}

table Product {
  id: string;
  description: string;
  url: string;
  brand: Brand;
}

table Vibe {
  id: string;
  description: string;
  product_idx: [int];
}

table Category {
  id: string;
  description: string;
  url: string;
  product_idx: [int];
}

enum VibeMode : byte { ProductText = 0, ProductImage, Text, Image, Category}

table FlatEmbeddingBatch {
    id: string;
    shape:[int];
    vibe_mode: VibeMode;
    flat_tensor:[float];
    product_idx: [int];
}

table ProductVibes {
  id: string;
  products: [Product];
  brands: [Brand];
  categories: [Category];
  vibes: [Vibe];
  product_text_batches: [FlatEmbeddingBatch];
  product_image_batches: [FlatEmbeddingBatch];
  text_batches: [FlatEmbeddingBatch];
  image_batches: [FlatEmbeddingBatch];
}

table Catalog {
  id : string;
  version: string;
  description: string;
  product_vibes: ProductVibes;
  batch_size: int;
  tokenizer_name_or_path: string;
  model_name_or_path: string;
}

root_type Catalog;

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.