Code & Datasets


Check my repositories at github and bitbucket!

  • Code

    Neural mapping of natural language navigational instructions to actions using RNN LSTM

    Based on the work by Hongyuan Mei, Mohit Bansal, and Matthew R. Walter (link here), implemented in the framework Tensorflow.
    The implementation of the entire model made by the authors can be found in Hongyuan Mei's reposiroty here (coming soon).

    [code]
  • Code

    Discourse influence in speakers' choice of referring expressions

    Implementation based on the work by N. Orita, E. Vornov, N. Feldman, H. Daumé III, 2015 (link here). The paper proposes a language production speaker model to incorporate updates to listeners's belief as discourse proceeds. It incorporates discourse salience, cost of speech production and probabilities of unseen referents (obtained from an external lexicon).

    [code]
  • Code

    Discourse analysis: topicality and salience

    Implementation based on the work by N. Orita, E. Vornov, N. H Feldman, and J. Boyd-Graber. 2014 (link here). The paper presentes an analysis of the role of referents topicality in speakers' choice of referring expressions.

    [code]
  • Code

    Meteor plasma trail classifier

    SVM classifier for meteor plasma trails captured by radar echo. The echo signal is modeled as a grey-scale image and segmentation is performed in order to isolate the meteor trails from the background noise.

    [code]
  • Code

    Labor market analysis using Topic Models and Shallow Parsing

    Code of the work presented in the paper of the same name. Analysis of the skills sought in engineers in Peruvian industry. Topic models are used to examine the relationship between requirements and functions, previously extracted by a Shallow Parser. All data was extracted from job-hunting websites, preprocessed and manually annotated.
    Interactive data visualizations and results explorer are available in these links:
    - Relationship between all professional majors in Peru. [link]
    - Topic content browser, based on the work of Chaney and Blei (2012). [link]

    [code]
  • Code

    Rule-based professional major extractor

    Professional majors requested in job ads are extracted using regular expressions and matched with defined major identifiers. Details in this paper.

    [code]
  • Code

    Simultaneus Localization and Mapping and Path Planning for Ackerman model mobile robots

    SLAM simulation for Ackerman mobile robots using EKF and particle filters, testing landmarks maps and occupancy grid maps of university facilities.
    For path planning, an Hybrid A* algorithm was implemented and tested for an HPI Racing Buggy (a scaled sand car), at indoor environments.

    [code]

  • Dataset

    Job Ad corpus annotated for Name Entity Recognition tasks

    Consisting of 800 job ads, each one tokenized and manually annotated with POS tag information (EAGLE Spanish format) and Entity Label in BIO format. Details in this paper.

    [annotated_data.zip]
  • Dataset

    Entities extracted from Job Ads corpus

    Consisting of nearly 9000 job ads sampled from the database, tokenized and filtered from low-frequency words and tokens of no interest (phone numbers, salary, office hours, emails, urls). Then, the shallow parsers extract the relevant phrases. Dataset used in the topic model module of this paper.

    [link to repo]
  • Dataset

    Complete preprocessed Job Ads corpus

    More than 900k job ads extracted from popular Latin American job-hunting websites. The NLTK tokenizer was extended to capture technical words typical of these kind of advertisement.

    [available upon request]