Training Course

Foundations of Text Mining and Natural Language Processing

Course intended for

Text constitutes at least 70% of all data generated in IT systems. Such data is rarely used for analytical purposes or knowledge discovery. This course covers the problems related to the processing and analysis of textual data. The course is addressed to:

  • programmers who wish to use the knowledge discovery methods using Text Mining in their systems,

  • analysts who wish to extend their analytical workshop by a Text Mining analysis tool,

  • those interested in using statistical tools and machine learning methods when working with Text Mining.

Basic programming knowledge in any language is required (for example Python, R, MATLAB).

Course objective

Participants will learn a number of tools designated for working on Text Mining and NLP problems. A number of examples of their use will be presented which cover the majority of topics from that domain. Course will be conducted using language most commonly used for text analytics - Python.

Course strengths

Many examples of practical application at work will be provided. The participants become familiar with Text Mining analysis and the possibilities of using it at work.


Basic programming experience, experience in data analysis.

Course parameters

3*8 hours (includes 1 hour of breaks each day) of lectures and workshops.

Course Agenda
  1. Introduction and definitions
    • Text Mining
    • NLP
    • IR
  2. Toolbox
    • Working with text
      • string operations
      • regex
    • Overview of basic tools
      • pandas
      • scikit-learn
      • NLTK
      • spaCy
    • Loading data in different formats
    • Web scraping
  3. Basic text processing
    • Tokenization
    • Normalization
    • Stopwords removal
    • Stemming
    • Lemmatization
  4. Visualization
  5. Text representations
    • Document-term matrix
    • Bag of words
    • TF-IDF
    • word2vec
    • doc2vec
  6. Topic Modelling
    • LSI
    • LDA
  7. Text summarization
  8. Text similarity
    • Word similarity
    • Document similarity
  9. Clustering
    • K-means
    • Hierarchical Agglomerative Clustering
  10. Classification
    • Naive Bayes
    • SVM
  11. Part of Speech tagging
  12. Sentiment analysis
    • Dictionary approach
    • Supervised learning approach
  13. Named Entity Recognition
  14. Semantic analysis
  15. Parsing
    • Shallow parsing
    • Dependency-based parsing
  16. Resources
Course Length

3 days

Order Course

899 EUR (online) per participant

Our clients

Societe Generale logo
Lufthansa Systems logo

Reservation Form

Please fill the form below to reserve your seat or request additional information.

Course name: Foundations of Text Mining and Natural Language Processing

Call Us
(+48) 22 203 56 00
Nowogrodzka 62C, Warsaw, Poland
UTC / GMT +1

Get social