- Linguistic Analysis:
- tokenization
- splitting texts into sentence
- part of speech tagging
- identifying phrases and other syntactic patterns
- Named Entity Recognition: e.g. which persons/organizations/locations are mentioned in the text
- Term Extraction: extraction of important conceptual vocabulary from text
- Relation Extraction: identifying relations between terms
- Knowledge Management for formal representation and querying: XML, RDF