Lists (9)
Sort Name ascending (A-Z)
Stars
Letting AI Actively Manage Its Own Context
A framework for efficient model inference with omni-modality models
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
A Datacenter Scale Distributed Inference Serving Framework
Fast CUDA matrix multiplication from scratch
The "Small Vision-Language Model" (SVLM) is a compact multimodal model tailored for beginners or users with limited computational resources. Its main goal is to optimize the integration of visual a…
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…
A low-latency, billion-scale, and updatable graph-based vector store on SSD.
vsag is a vector indexing library used for similarity search.
Vector search engine inside Milvus, integrating FAISS, HNSW, DiskANN.
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …
Exercises in C - used at University of Bristol in COMSM1201
💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline
FudanSELab / train-ticket
Forked from hechuan73/train_ticketTrain Ticket - A Benchmark Microservice System
FunctionBench : A Suite of Workloads for Serverless Cloud Function Service
Automatic resource configuration for serverless workflows.
A new engine for Durable Functions. https://microsoft.github.io/durabletask-netherite
Virtual Memory Abstraction for Serverless Architectures
Thread pool implementation using c++11 threads