Check my repositories at github and bitbucket!
Neural mapping of natural language navigational instructions to actions using RNN LSTM
Based on the work by Hongyuan Mei, Mohit Bansal, and Matthew R. Walter (link here), implemented
in the framework Tensorflow.
The implementation of the entire model made by the authors can be found in Hongyuan Mei's reposiroty here (coming soon).
Discourse influence in speakers' choice of referring expressions
Implementation based on the work by N. Orita, E. Vornov, N. Feldman, H. Daumé III, 2015 (link here). The paper proposes a
language production speaker model to incorporate updates to listeners's belief as discourse proceeds.
It incorporates discourse salience, cost of speech production and probabilities of unseen referents (obtained from an external lexicon).
Meteor plasma trail classifier
SVM classifier for meteor plasma trails captured by radar echo. The echo signal is modeled as a grey-scale image and segmentation is performed in order to isolate the meteor trails from the background noise.
[code]Labor market analysis using Topic Models and Shallow Parsing
Code of the work presented in the paper of the same name. Analysis of the skills sought in engineers in Peruvian industry. Topic models are used to examine
the relationship between requirements and functions, previously extracted by a Shallow Parser. All data was
extracted from job-hunting websites, preprocessed and manually annotated.
Interactive data visualizations and results explorer are available in these links:
- Relationship between all professional majors in Peru. [link]
- Topic content browser, based on the work of Chaney and Blei (2012). [link]
Rule-based professional major extractor
Professional majors requested in job ads are extracted using regular expressions and matched with defined major identifiers. Details in this paper.
[code]Simultaneus Localization and Mapping and Path Planning for Ackerman model mobile robots
SLAM simulation for Ackerman mobile robots using EKF and particle filters, testing landmarks maps and occupancy grid maps of university facilities.
For path planning, an Hybrid A* algorithm was implemented and tested for an HPI Racing Buggy (a scaled sand car), at indoor environments.
Job Ad corpus annotated for Name Entity Recognition tasks
Consisting of 800 job ads, each one tokenized and manually annotated with POS tag information (EAGLE Spanish format) and Entity Label in BIO format. Details in this paper.
[annotated_data.zip]Entities extracted from Job Ads corpus
Consisting of nearly 9000 job ads sampled from the database, tokenized and filtered from low-frequency words and tokens of no interest (phone numbers, salary, office hours, emails, urls). Then, the shallow parsers extract the relevant phrases. Dataset used in the topic model module of this paper.
[link to repo]Complete preprocessed Job Ads corpus
More than 900k job ads extracted from popular Latin American job-hunting websites. The NLTK tokenizer was extended to capture technical words typical of these kind of advertisement.
[available upon request]