Emerging Trends in Software Engineering

University of Oulu
Home Noppa 811600S >  Exercises



Source code

See https://github.com/M3SOulu/TrendMining


Outline - Requirements

This document gives an outline for the trend mining report you need to produce in the group exercises. 


Length of the document  Suggested length 10-30 pages. The number of figures greatly affects the report length.



Start to think about your own topic


Report structure

Here are the headings and some suggestion on the content to the group work report. 

1. Introduction

- What is this topic about (The reader might not be aware of what is e.g. Sentiment Analysis, or Cryptocurrencies)
-Why is this topic important (globally)?
-Why is this topic important (to your group)?

2. Search Strings

- What search strings were used?
- Are the search string the same for all data sources?
- What was the process to form a correct search string?
- After running all analysis are you able to find even a better search string

3. Stopwords

- Report a list of stop words you used (please exclude common English stop words and report only your context specific stop words, for example “software” could be a stop word in some topics but not in others)
- Explain the process of coming up with correct stop words.

4. Dendogram and Hierarchical cluster

- Show your Dendogram cluster figure (or a part of it if it is too big)
- Explain what is in the figure
- Did you find this technique useful?

5. Word clouds

- Show word clouds and comparison word clouds from all 3 data sources (Scopus, Twitter, StackOverflow)
- Report what are difference between sources, between old and new sources, between popular (top cited/voted/retweeted) sources
- Did you find this technique useful?

6. Time Lines and Popularity

- Show and report time lines where number of sources over time can be seen
- Do statistical analysis of popularity. What factors explain popularity

- Did you find this technique useful?

7. Top-5 Popular sources

- List and explain the top-5 sources for each data of the 3 data sources. What topics are the top-5 source about what the commonalities and differences in them.
- Did you find this technique useful?

8. Interactive LDA clustering

- Study the interactive LDA clusters. Take some screenshots and report some finding 3-5 that you find interesting
- Did you find this technique useful?

9. Hot and cold topics (with LDA clustering)

- Show the hot and cold topics by graphs and by listing top-10 terms
- Did you find this technique useful?

10 . Discussion and conclusions

Summarize and discuss your findings. What are the most notable findings? What trend mining techniques did you find the most beneficial? Are there some techniques that could be added? Are there interesting directions for future research? Is there a call for action (now what)?

Exercise material

General Introduction to R from Tilastollisen data-analyysin perusteet tietojenkäsittelytieteilijöille, 5 op
Example material (Note: This outdated. Only includes Scopus data. Also author analysis is no longer recommended)

Single Exercises

Printable version
Updated 21 Sep 18 at 16:28

University of Oulu oulun.yliopisto(at)oulu.fi
Tel. +358 294 48 0000
Fax +358 8 553 4112
PL 8000
FI-90014 Oulun yliopisto