NicolasCastro / GeneralDataMining
Project infos
| License | MIT |
| Tags | metrics, data mining, extras, gender |
| Creation date | 2014-08-04 |
| Website |
Monticello registration
About GeneralDataMining
A set of data mining tools.
Purpose
This project pretends to be a sandbox for datamining objects.
Basics
aProvider <---> Heuristics ---> anEstimator
Heuristics decompose an input in n-grams.
Providers bring information about each n-gram detected by heuristics, returning samples.
Estimators determine the result based on samples.
Current heuristics
Genders: using different gender providers (as web providers like iGender and genderize) or a local provider, Heuristics can determine gender using a given name, a user name or an e-mail.
Example
Getting a username's gender from a web provider, and computing the samples with the roup gender estimator:
TWDMHeuristics genderFromComplexName: 'Martin@Spain!' providedBy: (TWDMGenderAPIProvider new) estimatedBy: (TWDMGroupGenderEstimator new)
About the Local Gender Provider
Since the local provider uses a dictionary of names, you should get one. You can find the official in https://github.com/Ztrint/OpenDatabases.
