11. Path to Ph.D.

Path to Ph.D.:

Starting Point:

Background:

YCP BA; minors in HR, finance, and accounting.

YCP MBA business science (Lean, Six Sigma, etc.)

Worked as Process Engineer, HRIS, etc.

HU Masters in Analytics: Case study Data analysis - Spanish language acquisition and pedagogy. This work entailed a literature review on Spanish linguistics, a comparison of current language learning technologies and techniques; a new method (inspired by the Guidonian hand mnemonic device) was conceptualized, tested, data collected, analyzed, and the results were visualized using R.

Ideas for Ph.D.:

  1. Gamification of linguistics:

    • Language acquisition in a VR environment/ an exploration of online learning environments; this included a focus on three-dimensional ultrasound imaging (internal and external) of articulatory data collection/monitoring and exploration of language acquisition via various stimuli.

    • Road blocks= Spoke to Ford and five other teachers. There was partial positive feedback but not full support. No traction/it did not appear to fit for a Ph.D. in data science.

  2. Projects of cultural and environmental significance:

    • Conservation via the collection and monitoring of acoustic data related to animals and humans; deep learning related to entity identification - animal acoustics and noise pollution, and monitoring endangered languages (on social media).

    • Road blocks= Spoke to Ford. There was partial positive feedback but not full support from Dr. Ford. While I was able to find external support, I was unable to find internal support members; unable to form a committee from HU members. No traction/it did not appear to fit for a Ph.D. in data science.

  3. Projects related to understanding language vitality:

    • Since no language vitality survey has been conducted in Saint Lucia since the 1970s, it is currently difficult to assess if the language heritage language of Saint Lucian Kwéyòl (Antillean Creole/Patios/Patwa) is thriving or flagging. I proposed an exploration, collection, and analysis of social media occurrences of creole statements, review of creole in pop-culture (such as songs), and literature works. This would be bolstered by conducting a census-type survey to curate a language dataset to assess the state of vitality/state of endangerment or extinction of the Saint Lucian Kwéyòl (Antillean Creole/Patios). See draft online-form. I also created a language acquisition/reinforcement tool - a word search puzzle using HTML, CSS, and JS.

    • Road blocks= Spoke to Ford. While I was able to find external support, I was unable to find internal support members; unable to form a committee from HU members. No traction/it did not appear to fit for a Ph.D. in data science.

  4. Index creation:

    • Creation of a new social index utilizing sentiment analysis of social media platform commentary, such as Tripadvisor reviews; this metric could be a new factor (visitor perception) that can be combined with other existing metrics of calculating a country’s social global ranking. This would explore the sentiments concerning a tourist’s experience, as well as seek the reasoning why they chose the destination; is this due to eco-tourism, health tourism, poverty tourism, etc.? This could be generalized, as in, based on the country’s top five attractions and the ratings/ commentary those receive.

    • Road blocks= Spoke to Ford. There was partial positive feedback but not support from Dr. Ford. I was unable to find internal support members; unable to form a committee from HU members. No traction/it did not appear to fit for a Ph.D. in data science.

  5. Presented on using data science for environmental analysis with index creation:

    • Proposed and presented a multi-paper cross-domain attempt at highlighting ‘the duality of the fragility of endangered animals (sounds) and endangered human languages’ to HU members; it was a biocultural diversity effort to encourage conservation through acoustic analysis and monitoring of endangered animals, and the monitoring of endangered heritage language of Saint Lucian Kwéyòl (on social media via sentiment analysis etc.).

    • Road blocks= Spoke to Ford, Ashby, Arvid, and five other teachers. While I was able to find external support, there was no additional internal support; unable to form a committee. No traction/it did not appear to fit for a Ph.D. in data science; I was advised to focus more on business or tourism, or anything I had a background in, or focusing on one area (not a multi-paper publication route).

  6. Began to focus solely on acoustic analysis:

    • Attempted to evaluate the current acoustic methods of tracking endangered species and noise pollution.

    • Road blocks= While I was able to find external support from Cornell, there was no additional internal support; unable to form a committee from HU members. It did not appear to be fit for a Ph.D. in data science.

  7. Explored Traumatic Brain injury and language retention/acquisition as a recommendation from Dr. Ford:

    • This was based on the idea that Maryland would be able to offer worthwhile datasets. I read up on neurolinguistics and Alzheimer’s detection; this focused on using text analysis or articulatory data collecting/monitoring.

    • Road blocks= Was not able to access the data right away, yet when data was finally available, it did not appear to be substantial for a Ph.D. in data science.

  8. Covid pivot - Began focusing on data science pedagogy and methodology:

    • Some research slowed down as some people had to pivot/redesign their ideas on in-person interaction with test subjects, which was made difficult with the pandemic.

    • Explored the improvement of data science pedagogy and methodology; wanted to create a dissertation surrounding a new framework for teaching data science to address current systematic flaws. I wanted to explore a possible cross-domain approach to ethical data science projects through highlighting the similarities of skills and technologies associated with, and the possible merger of, the topics related to digital humanities and biocultural diversity. This idea focused on utilizing Maslow’s hierarchy of needs as inspiration for the classification of topics; the inclusion of Maslow’s hierarchy was a nod to my business background as well as the noted increase in mentions of this concept in various academic fields during the pandemic. I wanted to explore topics inspired by each level of the hierarchy, or how a topic could be classified into that hierarchy.

    • Road blocks= There was partial internal support from Dr. Ford but was unable to find internal support members; unable to form a committee from HU members. No traction/it did not appear to fit for a Ph.D. in data science.

  9. Using Maslow’s need for safety and security, and the lens of linguistics created an association between linguistics and crime; thought I could add to the data science field through data curation and analysis:

    • Wanted to create an NLP project on legal phrase detection; hoped to use entity (phrase) detection to predict success in court; also wanted to convert existing linguistic vitality survey to an online survey (a demographic and linguistic census-type survey of lawyers where an endangered language is spoken, and how their linguistic capacities helped or hindered client representation). See draft online form and draft presentation.

    • Road blocks= Spoke to Purcell, Ashby, and Ford. There was partial internal support from Dr. Ford; there was some traction but, he mentioned focusing more on sentiment analysis, and using the language as a background for the dataset. Unable to form a committee from HU members. It did not appear to be fit for a Ph.D. in data science.

  10. Current state:

  • Delving into NLP and Machine translation and sentiment analysis:

Currently exploring Cross-Lingual Sentiment Analysis (might look at ‘Integrating Gated Recurrent Unit with Genetic Algorithm’); using LIWC with Machine Translation.

Perhaps a ‘Phrase-based sentiment detection for cross-lingual machine translation for small languages and underresourced (low-resource) domains: A Sentiment Analysis System for the Saint Lucian Kwéyòl (Antillean Creole/Patios) Language by Integrating Gated Recurrent Unit with Genetic Algorithm’.

I do want to see if I can salvage anything concerning biocultural diversity; particularly, the vitality of the Saint Lucian Kwéyòl (Antillean Creole/Patios/Patwa) language via sentiment analysis of social media platfom commentaries, such as Tripadvisor reviews, or Facebook and Twitter activity. This would be an exploration, collection, and analysis of social media occurrences of creole statements, and reviews of creole in pop-culture (such as songs/hymns), and works of literature.

I am interested in leveraging the natural annotation provided by a crowdsourced online dictionary Wiwords, to actively/continuously address word sense disambiguation issues that arise from the analysis of a low-resource language (Saint Lucian Creole)’s social media (Facebook, Instagram, Twitter, etc.) text data. By improving word sense disambiguation issues with the creole language, one may be better able to assess the language’s vitality via assessing the frequency of its usage in social media posts.

See a draft of expanded discussion.

  • Current inspirations:

Cross-Lingual Sentiment Analysis with Machine Translation

Co-training for cross-lingual sentiment classification

Cross-lingual mixture model for sentiment classification

Language Identification of Guadeloupean Creole

  • Additional notes:

Access is available to labeled dictionary XML dataset in the target language (but no sentiment data).

There are quite a few language apps developed thanks to SIL’s webapps, however, there was none for Saint Lucian Kwéyòl. So, I am currently reviewing SIL’s Online Dictionary-Making & Lexicography Course using the XML files for SIL’s Kwéyòl Dictionary.

  • I am a bit familiar with TensorFlow via practice on GoogleCollab.