Our Partner

HausaNLP is an open-source community passionate about advancing Hausa Natural Language Processing. Through collaboration, research, and resource development, they aim to empower the Hausa language through technology and innovation.

Learn More ↗

Problem

Hausa is the second most spoken language in Africa, with an estimated 64 million speakers. Despite its widespread use, natural language processing (NLP) for Hausa remains significantly underdeveloped.  HausaNLP is an open-source community of academics, researchers, students, ML engineers, and NLP enthusiasts who are dedicated to democratizing Hausa natural language processing by developing Hausa language resources, promoting natural processing research, and advancing collaboration among relevant stakeholders.

Solution

We collaborated with HausaNLP to build a comprehensive Hausa language data catalog, a centralized platform where researchers and developers can easily access datasets and models. The platform also allows users to upload and share their own contributions, ensuring that resources remain accessible, community-driven, and relevant to Hausa-speaking communities across West Africa and the diaspora.

Design

Our primary users are researchers who often need to create and curate specialized datasets. To support this, we designed a straightforward upload process that allows users to add their datasets and tag them with key metadata for easy discovery in the catalog. We also implemented advanced filtering to help users quickly find what they need. To encourage contributions, we added a contributors page where users can choose to be acknowledged for their work on the platform. We centered our design around what researchers needed in order to find particular datasets.

Tech Stack

HausaNLP Data Catalog is a Next.JS full-stack application that was developed and designed from scratch. The backend is built with MongoDB, Prisma, and tRPC for API calls, and the frontend is built with React, TailwindCSS, and Material UI. TypeScript is used throughout the application. HausaNLP Data Catalog has been fully deployed using Vercel and MongoDB Atlas. 

HausaNLP Tech Stack

Features

Data Catalog

To help researchers quickly find the public datasets most relevant to their projects, this feature provides a structured way to browse, filter, and navigate dataset collections. Each dataset is identified with detailed metadata—including its name, link, year, language, collection style, size, unit, and associated task—so that users can easily compare options and seamlessly move from high-level exploration to accessing the dataset itself.

Data Catalog Image

Dataset/Model Upload Form

This workflow provides a straightforward three-step process for contributors to upload their datasets: users fill in the necessary information about their dataset, submit the form, and then wait for an admin to review the contribution. Once the review is complete, the dataset can be approved and published on the website.

Upload Form Image 2

Admin Dataset Review

Admins can review all pending dataset submissions and choose to either approve or deny them. When a submission is approved, the dataset is added to the catalog; when it is denied, the admin supplies a reason for the denial so the contributor knows what changes are needed.

Dataset Review Image

Admin Dashboard

The admin dashboard enables ongoing maintenance of the platform by letting admins review and update existing datasets in the catalog. It also provides an easy way to switch between pending submissions and the main catalog, allowing admins to manage new contributions and maintain published datasets in one place.

Admin Dashboard Image

Statistics Page

The statistics page aggregates information from all current datasets on HausaNLP, providing an overview of key metrics such as year and language. By compiling this data across the entire catalog, the page helps users quickly understand the distribution and characteristics of the datasets available on the platform.

Statistics Page

Contributors Page

The contributors page highlights individuals who have added datasets to the HausaNLP data catalog and serves to recognize and acknowledge their work. Contributors are displayed on the page only if they provide consent, ensuring that recognition is both accurate and voluntary.

Contributors Page Image

About Us Page

The About Us page provides information about the HausaNLP website and its mission, helping users understand the purpose of the platform. It also offers ways for users to stay connected, giving visitors a clear sense of what the project aims to achieve and how they can follow its work.

About Us Image

Meet The Team

Vy Nguyen Headshot Vy Nguyen Product Manager
Benjamin Chang Benjamin Chang Tech Lead
Azaan Shaikh Azaan Shaikh Product Designer
Siri Picture Sirihaasa Nallamothu Software Developer
Bhuvana Betini Bhuvana Betini Software Developer
Evan Lin Picture Evan Lin Software Developer
Sophie Lin picture Sophie Lin Software Developer
Keshav Subramonian Picture Keshav Subramonian Software Developer