vectominist

Follow

🎯

Focusing

Heng-Jui Chang vectominist

🎯

Focusing

Follow

PhD Candidate @ MIT CSAIL. Speech Processing and Balloon Arts.

89 followers · 17 following

Massachusetts Institute of Technology
Cambridge, MA
00:19 (UTC -04:00)
people.csail.mit.edu/hengjui
@hjchang87

Achievements

Achievements

Highlights

Pro

Organizations

Stars

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 1,119 86 Updated Dec 23, 2024

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 5,156 752 Updated May 6, 2026

OpenMOSS / MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 1,764 164 Updated May 6, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 46,713 5,183 Updated May 6, 2026

kan-bayashi / LibriTTSLabel

Alignment files of LibriTTS.

68 7 Updated Mar 16, 2020

CorentinJ / librispeech-alignments

Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset

Python 180 24 Updated Mar 25, 2019

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 9,886 1,056 Updated May 5, 2026

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,980 770 Updated Mar 26, 2026

duoan / TorchCode

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 3,799 309 Updated Mar 27, 2026

Tencent / StableToken

[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.

Python 31 2 Updated Feb 27, 2026

facebookresearch / dacvae

DACVAE

Python 218 18 Updated Dec 22, 2025

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 72 8 Updated Mar 22, 2026

Audio-WestlakeU / ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 166 16 Updated Apr 22, 2026

jimbozhang / xares

A benchmark for evaluating audio encoders on various audio tasks.

Python 53 9 Updated Apr 27, 2026

Red-Killer / shit

3,985 263 Updated Feb 15, 2026

a43992899 / MARBLE

State-of-the-art pretrained music models for training, evaluation, inference

Python 177 19 Updated Jan 20, 2026

kuleshov-group / bd3lms

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 999 77 Updated Jul 10, 2025

facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 930 64 Updated Oct 28, 2024

mitmath / matrixcalc

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 582 85 Updated Jan 31, 2026

Alexander-H-Liu / dinosr

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Python 53 5 Updated Jan 18, 2024

state-spaces / mamba

Mamba SSM architecture

Python 18,180 1,720 Updated May 3, 2026

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,774 1,175 Updated Apr 8, 2026

facebookresearch / fairseq2

FAIR Sequence Modeling Toolkit 2

Python 1,131 140 Updated Apr 27, 2026

rtqichen / torchdiffeq

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Python 6,421 996 Updated Apr 4, 2025

ChenyangLEI / All-In-One-Deflicker

[CVPR2023] Blind Video Deflickering by Neural Filtering with a Flawed Atlas

Python 758 45 Updated May 21, 2025

nextai-translator / bob-plugin-openai-translator

基于 LLM 的文本翻译、文本润色、语法纠错 Bob 插件，让我们一起迎接不需要巴别塔的新时代！Licensed under CC BY-NC-SA 4.0

TypeScript 5,651 258 Updated May 2, 2026

facebookresearch / muavic

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 399 36 Updated Sep 11, 2023

lucidrains / autoregressive-linear-attention-cuda

CUDA implementation of autoregressive linear attention, with all the latest research findings

Python 46 3 Updated May 23, 2023

iver56 / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

Python 1,146 100 Updated Nov 24, 2025

iver56 / audiomentations

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,267 219 Updated Apr 13, 2026