Related papers: A Robust Cybersecurity Topic Classification Tool

Generating Cyber Threat Intelligence to Discover Potential Security Threats Using Classification and Topic Modeling

Due to the variety of cyber-attacks or threats, the cybersecurity community enhances the traditional security control mechanisms to an advanced level so that automated tools can encounter potential security threats. Very recently, Cyber…

Machine Learning · Computer Science 2022-11-15 Md Imran Hossen , Ashraful Islam , Farzana Anowar , Eshtiak Ahmed , Mohammad Masudur Rahman , Xiali , Hei

Classifying Web Exploits with Topic Modeling

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised…

Cryptography and Security · Computer Science 2017-10-17 Jukka Ruohonen

Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models

Understanding the attack patterns associated with a cyberattack is crucial for comprehending the attacker's behaviors and implementing the right mitigation measures. However, majority of the information regarding new attacks is typically…

Machine Learning · Computer Science 2024-12-02 Weiqiu You , Youngja Park

An Automated Text Categorization Framework based on Hyperparameter Optimization

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses,…

Computation and Language · Computer Science 2017-09-18 Eric S. Tellez , Daniela Moctezuma , Sabino Miranda-Jímenez , Mario Graff

Computer users are generally faced with difficulties in making correct security decisions. While an increasingly fewer number of people are trying or willing to take formal security training, online sources including news, security blogs,…

Cryptography and Security · Computer Science 2020-06-29 Tingmin Wu , Wanlun Ma , Sheng Wen , Xin Xia , Cecile Paris , Surya Nepal , Yang Xiang

Evaluation and Improvement of Chatbot Text Classification Data Quality Using Plausible Negative Examples

We describe and validate a metric for estimating multi-class classifier performance based on cross-validation and adapted for improvement of small, unbalanced natural-language datasets used in chatbot design. Our experiences draw upon…

Information Retrieval · Computer Science 2019-06-06 Kit Kuksenok , Andriy Martyniv

CTM -- A Model for Large-Scale Multi-View Tweet Topic Classification

Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a…

Computation and Language · Computer Science 2022-05-04 Vivek Kulkarni , Kenny Leung , Aria Haghighi

Creating a Cybersecurity Concept Inventory: A Status Report on the CATS Project

We report on the status of our Cybersecurity Assessment Tools (CATS) project that is creating and validating a concept inventory for cybersecurity, which assesses the quality of instruction of any first course in cybersecurity. In fall…

Cryptography and Security · Computer Science 2017-06-19 Alan T. Sherman , Linda Oliva , David DeLatte , Enis Golaszewski , Michael Neary , Konstantinos Patsourakos , Dhananjay Phatak , Travis Scheponik , Geoffrey L. Herman , Julia Thompson

Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

With the recent rise of toxicity in online conversations on social media platforms, using modern machine learning algorithms for toxic comment detection has become a central focus of many online applications. Researchers and companies have…

Artificial Intelligence · Computer Science 2020-03-30 Ameya Vaidya , Feng Mai , Yue Ning

Contextualized Topic Coherence Metrics

The recent explosion in work on neural topic modeling has been criticized for optimizing automated topic evaluation metrics at the expense of actual meaningful topic identification. But human annotation remains expensive and time-consuming.…

Computation and Language · Computer Science 2023-05-25 Hamed Rahimi , Jacob Louis Hoover , David Mimno , Hubert Naacke , Camelia Constantin , Bernd Amann

A Machine Learning Approach to Comment Toxicity Classification

Now-a-days, derogatory comments are often made by one another, not only in offline environment but also immensely in online environments like social networking websites and online communities. So, an Identification combined with Prevention…

Computation and Language · Computer Science 2019-03-19 Navoneel Chakrabarty

Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study

With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning…

Computation and Language · Computer Science 2021-08-17 Ayush Kumar , Pratik Kumar

Machine learning approach for text and document mining

Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a…

Information Retrieval · Computer Science 2014-06-09 Vishwanath Bijalwan , Pinki Kumari , Jordan Pascual , Vijay Bhaskar Semwal

JCTC: A Large Job posting Corpus for Text Classification

The absence of an appropriate text classification corpus makes the massive amount of online job information unusable for labor market analysis. This paper presents JCTC, a large job posting corpus for text classification. In JCTC…

Information Retrieval · Computer Science 2017-06-13 Haoyu Xu , Chongyang Gu , Han Zhou , Sengpan Kou , Junjie Zhang

TCAB: A Large-Scale Text Classification Attack Benchmark

We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve…

Machine Learning · Computer Science 2022-10-25 Kalyani Asthana , Zhouhang Xie , Wencong You , Adam Noack , Jonathan Brophy , Sameer Singh , Daniel Lowd

Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning

Given the constant growth and increasing sophistication of cyberattacks, cybersecurity can no longer rely solely on traditional defense techniques and tools. Proactive detection of cyber threats has become essential to help security teams…

Cryptography and Security · Computer Science 2025-12-01 Sebastião Alves de Jesus Filho , Gustavo Di Giovanni Bernardo , Paulo Henrique Ribeiro Gabriel , Bruno Bogaz Zarpelão , Rodrigo Sanches Miani

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification

Verifying the credibility of Cyber Threat Intelligence (CTI) is essential for reliable cybersecurity defense. However, traditional approaches typically treat this task as a static classification problem, relying on handcrafted features or…

Cryptography and Security · Computer Science 2025-07-16 Fengxiao Tang , Huan Li , Ming Zhao , Zongzong Wu , Shisong Peng , Tao Yin

Text Categorization via Similarity Search: An Efficient and Effective Novel Algorithm

We present a supervised learning algorithm for text categorization which has brought the team of authors the 2nd place in the text categorization division of the 2012 Cybersecurity Data Mining Competition (CDMC'2012) and a 3rd prize…

Information Retrieval · Computer Science 2013-07-11 Hubert Haoyang Duan , Vladimir Pestov , Varun Singla

Fake News Detection Using Majority Voting Technique

Due to the evolution of the Web and social network platforms it becomes very easy to disseminate the information. Peoples are creating and sharing more information than ever before, which may be misleading, misinformation or fake…

Computation and Language · Computer Science 2022-03-29 Dharmaraj R. Patil

Conical Classification For Computationally Efficient One-Class Topic Determination

As the Internet grows in size, so does the amount of text based information that exists. For many application spaces it is paramount to isolate and identify texts that relate to a particular topic. While one-class classification would be…

Artificial Intelligence · Computer Science 2021-11-02 Sameer Khanna