Extracting Keywords from Crowdsourced Collections Project

Code

Digital Scholarship at Oxford (DiSc) / Official repository: https://github.com/Digital-Scholarship-Oxford/crowdsourced-data-tools

Description

AI tools developed at the University of Oxford for analysing crowdsourced text collections. They enable the identification of keywords, topics, and categories of terms.

Keyword Extraction (KE) is used to extract keywords directly from text, words or phrases, which both e.g. arise in a document and indicate what it is that the text is talking about.

Named Entity Recognition (NER) is used to identify segments of information referenced in a text and classify them into pre-established categories, such as ‘person’, ‘organisation’ and ‘location’.

Topic Modelling (TM) is used to uncover hidden thematic structures in large collections of text/textual documents, providing an automatic means to organise, understand and summarise them. It is a type of unsupervised machine learning that supports analysis of unstructured textual data i.e. it does not rely on labelled input.

License

AGPLv3