Happy !

NicolasCastro / GeneralDataMining

Project infos

License MIT
Tags metrics, data mining, extras, gender
Creation date 2014-08-04
Website

Monticello registration

About GeneralDataMining

A set of data mining tools.

Purpose

This project pretends to be a sandbox for datamining objects.

Basics

aProvider <---> Heuristics ---> anEstimator

Heuristics decompose an input in n-grams.

Providers bring information about each n-gram detected by heuristics, returning samples.

Estimators determine the result based on samples.

Current heuristics

Genders: using different gender providers (as web providers like iGender and genderize) or a local provider, Heuristics can determine gender using a given name, a user name or an e-mail.

Example

Getting a username's gender from a web provider, and computing the samples with the roup gender estimator:

TWDMHeuristics genderFromComplexName: 'Martin@Spain!' providedBy: (TWDMGenderAPIProvider new) estimatedBy: (TWDMGroupGenderEstimator new)

About the Local Gender Provider

Since the local provider uses a dictionary of names, you should get one. You can find the official in https://github.com/Ztrint/OpenDatabases.