Course intended for
Text constitutes at least 70% of all data generated in IT systems. Such data is rarely used for analytical purposes or knowledge discovery. This course covers the problems related to the processing and analysis of textual data. The course is addressed to:
-
programmers who wish to use the knowledge discovery methods using Text Mining in their systems,
-
analysts who wish to extend their analytical workshop by a Text Mining analysis tool,
-
those interested in using statistical tools and machine learning methods when working with Text Mining.
Basic programming knowledge in any language is required (for example Python, R, MATLAB).
Course objective
Participants will learn a number of tools designated for working on Text Mining and NLP problems. A number of examples of their use will be presented which cover the majority of topics from that domain. Course will be conducted using language most commonly used for text analytics - Python.
Course strengths
Many examples of practical application at work will be provided. The participants become familiar with Text Mining analysis and the possibilities of using it at work.
Requirements
Basic programming experience, experience in data analysis.
Course parameters
3*8 hours (includes 1 hour of breaks each day) of lectures and workshops.
Course Agenda
- Introduction and definitions
- Text Mining
- NLP
- IR
- Toolbox
- Working with text
- string operations
- regex
- Overview of basic tools
- pandas
- scikit-learn
- NLTK
- spaCy
- Loading data in different formats
- Web scraping
- Working with text
- Basic text processing
- Tokenization
- Normalization
- Stopwords removal
- Stemming
- Lemmatization
- Visualization
- Text representations
- Document-term matrix
- Bag of words
- TF-IDF
- word2vec
- doc2vec
- Topic Modelling
- LSI
- LDA
- Text summarization
- Text similarity
- Word similarity
- Document similarity
- Clustering
- K-means
- Hierarchical Agglomerative Clustering
- Classification
- Naive Bayes
- SVM
- Part of Speech tagging
- Sentiment analysis
- Dictionary approach
- Supervised learning approach
- Named Entity Recognition
- Semantic analysis
- Parsing
- Shallow parsing
- Dependency-based parsing
- Resources