Skip to content
View vectominist's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@s3prl

Block or report vectominist

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 1,119 86 Updated Dec 23, 2024

High-Quality Voice Cloning TTS for 600+ Languages

Python 5,156 752 Updated May 6, 2026

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 1,764 164 Updated May 6, 2026

Open-Source Frontier Voice AI

Python 46,713 5,183 Updated May 6, 2026

Alignment files of LibriTTS.

68 7 Updated Mar 16, 2020

Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset

Python 180 24 Updated Mar 25, 2019

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 9,886 1,056 Updated May 5, 2026

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,980 770 Updated Mar 26, 2026

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 3,799 309 Updated Mar 27, 2026

[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.

Python 31 2 Updated Feb 27, 2026

DACVAE

Python 218 18 Updated Dec 22, 2025

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 72 8 Updated Mar 22, 2026

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 166 16 Updated Apr 22, 2026

A benchmark for evaluating audio encoders on various audio tasks.

Python 53 9 Updated Apr 27, 2026

State-of-the-art pretrained music models for training, evaluation, inference

Python 177 19 Updated Jan 20, 2026

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 999 77 Updated Jul 10, 2025

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 930 64 Updated Oct 28, 2024

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 582 85 Updated Jan 31, 2026

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Python 53 5 Updated Jan 18, 2024

Mamba SSM architecture

Python 18,180 1,720 Updated May 3, 2026

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,774 1,175 Updated Apr 8, 2026

FAIR Sequence Modeling Toolkit 2

Python 1,131 140 Updated Apr 27, 2026

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Python 6,421 996 Updated Apr 4, 2025

[CVPR2023] Blind Video Deflickering by Neural Filtering with a Flawed Atlas

Python 758 45 Updated May 21, 2025

基于 LLM 的文本翻译、文本润色、语法纠错 Bob 插件,让我们一起迎接不需要巴别塔的新时代!Licensed under CC BY-NC-SA 4.0

TypeScript 5,651 258 Updated May 2, 2026

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 399 36 Updated Sep 11, 2023

CUDA implementation of autoregressive linear attention, with all the latest research findings

Python 46 3 Updated May 23, 2023

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

Python 1,146 100 Updated Nov 24, 2025

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,267 219 Updated Apr 13, 2026
Next