English
Related papers

Related papers: A Simple and Efficient Probabilistic Language mode…

200 papers

Natural language processing (NLP) techniques have become mainstream in the recent decade. Most of these advances are attributed to the processing of a single language. More recently, with the extensive growth of social media platforms focus…

Computation and Language · Computer Science 2022-01-12 Ramchandra Joshi , Raviraj Joshi

Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual…

Computation and Language · Computer Science 2021-07-05 Mohd Zeeshan Ansari , M M Sufyan Beg , Tanvir Ahmad , Mohd Jazib Khan , Ghazali Wasim

Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding…

With the constant growth of the World Wide Web and the number of documents in different languages accordingly, the need for reliable language detection tools has increased as well. Platforms such as Twitter with predominantly short texts…

Computation and Language · Computer Science 2016-08-31 Ivana Balazevic , Mikio Braun , Klaus-Robert Müller

Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity,…

Information Retrieval · Computer Science 2015-12-03 Cedric De Boom , Steven Van Canneyt , Steven Bohez , Thomas Demeester , Bart Dhoedt

Language identification of social media text still remains a challenging task due to properties like code-mixing and inconsistent phonetic transliterations. In this paper, we present a supervised learning approach for language…

Computation and Language · Computer Science 2018-06-28 Soumil Mandal , Sourya Dipta Das , Dipankar Das

Social media platforms such as Twitter and Facebook are becoming popular in multilingual societies. This trend induces portmanteau of South Asian languages with English. The blend of multiple languages as code-mixed data has recently become…

Computation and Language · Computer Science 2024-03-08 Rajat Singh , Nurendra Choudhary , Manish Shrivastava

Mixed language data is one of the difficult yet less explored domains of natural language processing. Most research in fields like machine translation or sentiment analysis assume monolingual input. However, people who are capable of using…

Neural and Evolutionary Computing · Computer Science 2014-12-23 Joseph Chee Chang , Chu-Cheng Lin

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express…

Computation and Language · Computer Science 2015-06-16 Kuan-Yu Chen , Shih-Hung Liu , Hsin-Min Wang , Berlin Chen , Hsin-Hsi Chen

Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications…

Information Retrieval · Computer Science 2016-07-05 Cedric De Boom , Steven Van Canneyt , Thomas Demeester , Bart Dhoedt

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call multilingual neural language models, takes sentences of multiple languages as…

Computation and Language · Computer Science 2018-09-10 Takashi Wada , Tomoharu Iwata

The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on…

Computation and Language · Computer Science 2020-10-12 Sunil Gundapu , Radhika Mamidi

The task of written language identification involves typically the detection of the languages present in a sample of text. Moreover, a sequence of text may not belong to a single inherent language but also may be mixture of text written in…

Computation and Language · Computer Science 2020-07-14 Mohd Zeeshan Ansari , Tanvir Ahmad , Ana Fatima

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for…

Computation and Language · Computer Science 2018-01-22 Goran Glavaš , Marc Franco-Salvador , Simone Paolo Ponzetto , Paolo Rosso

Social media platforms have grown into an important medium to spread information about an event published by the traditional media, such as news articles. Grouping such diverse sources of information that discuss the same topic in varied…

Computation and Language · Computer Science 2017-10-26 Aditya Mogadala , Dominik Jung , Achim Rettinger

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning…

Computation and Language · Computer Science 2020-05-11 Martina Toshevska , Frosina Stojanovska , Jovan Kalajdjieski

While word embeddings are currently predominant for natural language processing, most of existing models learn them solely from their contexts. However, these context-based word embeddings are limited since not all words' meaning can be…

Computation and Language · Computer Science 2016-08-23 Jifan Chen , Kan Chen , Xipeng Qiu , Qi Zhang , Xuanjing Huang , Zheng Zhang

Code-mixed discourse combines multiple languages in a single text. It is commonly used in informal discourse in countries with several official languages, but also in many other countries in combination with English or neighboring…

Computation and Language · Computer Science 2025-04-16 Anjali Yadav , Tanya Garg , Matej Klemen , Matej Ulcar , Basant Agarwal , Marko Robnik Sikonja

We consider probabilistic topic models and more recent word embedding techniques from a perspective of learning hidden semantic representations. Inspired by a striking similarity of the two approaches, we merge them and learn probabilistic…

Computation and Language · Computer Science 2017-11-15 Anna Potapenko , Artem Popov , Konstantin Vorontsov
‹ Prev 1 2 3 10 Next ›