Problem
GlobalGiving connects with organizations based in the US and nonprofits in other countries to create a comprehensive directory of nonprofits. After building a webscraper for them in Fall 2018 to search for and collect data on nonprofit organizations, we need a way to filter through this new information.
Solution
While the first semester primarily focused on a breadth-wide search for nonprofits, this semester’s product focused on the depth of the search into the specifics for the nonprofit. We built a data science tool which analyzes the websites and missions statements of nonprofits, group them together according to what they do, and give suggestions as to what each organization’s work may be. Ultimately, we aim to give GlobalGiving as much knowledge and foresight as possible when they reach out to new nonprofits to join their global network.
Tech Stack
Python, scikit-learn, Gensim, NTLK, Beautiful Soup, DynamoDB, spaCy
Features
Revised Categorization Schema
This categorization considers the patterns we found through experimentation and investigation, suggesting whom the nonprofit serves and what they do.
Stochastic Gradient Descent
We used scikit-learn's Stochastic Gradient Descent classifier, a OneVsRest classifier, and a multiclass classification to visualize the data of an NGO.
Document Vectors
Using Doc2Vec, we created a document vector model for each one using Kmeans to find similarities between projects.
Unguided LDA
Unguided LDA class creates a model representing the “topics” present in a dataset of non-profit projects.