Code & Datasets

Check my repositories at github and bitbucket!

Code

Neural mapping of natural language navigational instructions to actions using RNN LSTM

Based on the work by Hongyuan Mei, Mohit Bansal, and Matthew R. Walter (link here), implemented in the framework Tensorflow.
The implementation of the entire model made by the authors can be found in Hongyuan Mei's reposiroty here (coming soon).
[code]
Code

Discourse influence in speakers' choice of referring expressions

Implementation based on the work by N. Orita, E. Vornov, N. Feldman, H. Daumé III, 2015 (link here). The paper proposes a language production speaker model to incorporate updates to listeners's belief as discourse proceeds. It incorporates discourse salience, cost of speech production and probabilities of unseen referents (obtained from an external lexicon).

[code]
Code

Discourse analysis: topicality and salience

Implementation based on the work by N. Orita, E. Vornov, N. H Feldman, and J. Boyd-Graber. 2014 (link here). The paper presentes an analysis of the role of referents topicality in speakers' choice of referring expressions.
[code]
Code

Meteor plasma trail classifier

SVM classifier for meteor plasma trails captured by radar echo. The echo signal is modeled as a grey-scale image and segmentation is performed in order to isolate the meteor trails from the background noise.
[code]
Code

Labor market analysis using Topic Models and Shallow Parsing

Code of the work presented in the paper of the same name. Analysis of the skills sought in engineers in Peruvian industry. Topic models are used to examine the relationship between requirements and functions, previously extracted by a Shallow Parser. All data was extracted from job-hunting websites, preprocessed and manually annotated.
Interactive data visualizations and results explorer are available in these links:
- Relationship between all professional majors in Peru. [link]
- Topic content browser, based on the work of Chaney and Blei (2012). [link]
[code]
Code

Rule-based professional major extractor

Professional majors requested in job ads are extracted using regular expressions and matched with defined major identifiers. Details in this paper.
[code]
Code

Simultaneus Localization and Mapping and Path Planning for Ackerman model mobile robots

SLAM simulation for Ackerman mobile robots using EKF and particle filters, testing landmarks maps and occupancy grid maps of university facilities.
For path planning, an Hybrid A* algorithm was implemented and tested for an HPI Racing Buggy (a scaled sand car), at indoor environments.
[code]

Dataset

Job Ad corpus annotated for Name Entity Recognition tasks

Consisting of 800 job ads, each one tokenized and manually annotated with POS tag information (EAGLE Spanish format) and Entity Label in BIO format. Details in this paper.
[annotated_data.zip]
Dataset

Entities extracted from Job Ads corpus

Consisting of nearly 9000 job ads sampled from the database, tokenized and filtered from low-frequency words and tokens of no interest (phone numbers, salary, office hours, emails, urls). Then, the shallow parsers extract the relevant phrases. Dataset used in the topic model module of this paper.
[link to repo]
Dataset

Complete preprocessed Job Ads corpus

More than 900k job ads extracted from popular Latin American job-hunting websites. The NLTK tokenizer was extended to capture technical words typical of these kind of advertisement.
[available upon request]

Ronald A. Cardenas

Ph.D. Student at CDT in NLP

University of Edinburgh

Code & Datasets