Skip to main content

Vector Representation with a Finance Corpus

Submission Number: 142
Submission ID: 258
Submission UUID: 92db66c8-a836-4c3c-90f6-87e912e5b2d1
Submission URI: /form/project

Created: Thu, 03/03/2022 - 13:55
Completed: Thu, 03/03/2022 - 13:55
Changed: Wed, 05/17/2023 - 15:34

Remote IP address:
Submitted by: Gaurav Khanna
Language: English

Is draft: No
Webform: Project
Vector Representation with a Finance Corpus
bash (242), batch-jobs (76), deep-learning (303), distributed-computing (92), machine-learning (272), programming (5), python (69), research-facilitation (442), ssh (78)

Project Leader

Murat Aydogdu

Project Personnel

Ritesh Bachhar

Project Information

This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation.

This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource.

Project Information Subsection

A tested computational workflow for a GloVe based vector representation in an HPC environment.
Experience with writing and running python programs with large number of datasets in a distributed system environment such as HPC.
Some hands-on experience
Rhode Island College
Rhode Island
CR-University of Rhode Island
Already behind5Start date is flexible
  • Milestone Title: Milestone #1
    Milestone Description: Background study (vector representation, GloVe); HPC access; overview of project and edits of initial code; github repo setup
    Completion Date Goal: 2022-06-08
    Actual Completion Date: 2022-06-08
  • Milestone Title: Milestone #2
    Milestone Description: Testing code to extract text from corpora on a small scale; setting up job submission scripts on HPC cluster; testing combining output extensively; running GloVe program to produce small scale results
    Completion Date Goal: 2022-07-08
    Actual Completion Date: 2022-07-08
  • Milestone Title: Milestone #3
    Milestone Description: Executing the project at scale and generating results; presenting the results in a Zoom "wrap" presentation; contributing developed code/script/documentation to the github repo.

    Completion Date Goal: 2022-08-10
    Actual Completion Date: 2022-08-10

Final Report

Other than the development of an impactful resource that allows for powerful computing to train GloVe on large data volumes -- no other significant impact on the discipline of the project.
The student facilitator gained a lot of experience working with an HPC resource and will be using that experience in other areas of science including his own area of interest in computational physics. No other significant impact on another discipline.
Yes; the student facilitator enjoyed his engagement with CyberTeams and is open to the possibility of computational work/facilitation as a career option.
Yes; there is now a complete and tested HPC workflow for GloVe computations.
This was a somewhat more complex project for the time allotted and for the student's background. The project moved slower than expected and didn't make as much progress as quickly. However, with significant input from the project researcher and URI's HPC team, the project completed successfully -- the researcher is satisfied with the outcome and is able to run complex GloVe workflows in an HPC environment.
The project developed and tested a complete HPC workflow for GloVe related computations. The student facilitator enjoyed his engagement with CyberTeams and is open to the possibility of computational work/facilitation as a career option.