Provides classes for dealing with Stack Overflow artifacts.
Provides classes for dealing with Stack Overflow artifacts.
It also provides means to serialize and deserialize them from JSON files.
The main class to consider is ch.usi.inf.reveal.parsing.artifact.StackOverflowArtifact, representing a Stack Overflow discussion.
The main way to obtain a Stack Overflow discussion is to deserialize it froma JSON file, using ArtifactSerializer.
import ch.usi.inf.reveal.parsing.artifact._ val jsonFilePath = /*path to json file*/ val serializer = new ArtifactSerializer() val discussion = serializer.deserializeFromFile(jsonFilePath)
You can now inspect the discussion, for example by getting the question (a StackOverflowQuestion object) or answers (a seq of StackOverflowAnswer):
scala> val question = discussion.question question: ch.usi.inf.reveal.parsing.artifact.StackOverflowQuestion = ... scala> val answers = discussion.answers answers: Seq[ch.usi.inf.reveal.parsing.artifact.StackOverflowAnswer] = ...
The way to get the object representing the poster of the question (a StackOverflowUser) is the following:
scala> val posterUser = question.owner posterUser: Option[ch.usi.inf.reveal.parsing.artifact.StackOverflowUser] = Some(StackOverflowUser(...))
You can also easily get the comments (objects of StackOverflowComment), for example the ones posted for the question:
scala> val questionComments = question.comments questionComments: Seq[ch.usi.inf.reveal.parsing.artifact.StackOverflowComment] = ...
The easy way to get all information units (ch.usi.inf.reveal.parsing.units.InformationUnit) and the HASTs (classes implementing the ch.usi.inf.reveal.parsing.model.HASTNode trait) for an element (question or answer or comment) is:
scala> val allUnits = discussion.units allUnits: Seq[ch.usi.inf.reveal.parsing.units.InformationUnit] = List(...) scala> val astNodes = allUnits.map { _.astNode } astNodes: Seq[ch.usi.inf.reveal.parsing.model.ASTNode] =
For further information about the information units, the HAST and the model, please check the documentation of ch.usi.inf.reveal.parsing.model and ch.usi.inf.reveal.parsing.units.
Provides base classes to model HAST nodes for Stack Overflow artifacts.
Provides base classes to model HAST nodes for Stack Overflow artifacts.
Every note of the HAST implements the HASTNode trait.
Depending on the specific fragment, nodes implement traits in different packages. For example:
Special nodes are used for various purposes, and are contained in this package:
Provides classes representing information units (structured and textual units) and meta information (like tf vectors and type mentions).
Provides classes representing information units (structured and textual units) and meta information (like tf vectors and type mentions).
Information units (implementing the trait InformationUnit) represent paragraphs in a given document, which can be narrative text (NaturalLanguageTaggedUnit) or structured fragments (CodeTaggedUnit). Each information unit exports a set of meta-information (implementing the MetaInformation trait), which are ready made semantic data for simple analyses. This version of StORMeD provides the following meta-information:
In the case a meta information is not provided for a unit, a AbsentMetaInformation object can be also provided.
Suppose you want to get all the types mentioned in a question. First, you retrieve all its information units:
scala> val questionUnits = question.units questionUnits: Seq[ch.usi.inf.reveal.parsing.units.InformationUnit] = ...
Instead of using the visitor to on all the HASTs, you can exploit the ready made data provided by the meta information.
For example, to get all the meta information for units, you can use flatMap
:
scala> import ch.usi.inf.reveal.parsing.units._ import ch.usi.inf.reveal.parsing.units._ scala> val questionMetaInfos = questionUnits.flatMap { _.metaInformation } questionMetaInfos: Seq[ch.usi.inf.reveal.parsing.units.MetaInformation] = List(...)
You need now to filter to get only the CodeTypesMetaInformation, and then you can get, for example, the mentioned qualified Types:
scala> val codeTypesMetaInfos = questionMetaInfos.filter { _.isInstanceOf[CodeTypesMetaInformation] }.asInstanceOf[Seq[CodeTypesMetaInformation]] codeTypesMetaInfos: Seq[ch.usi.inf.reveal.parsing.units.CodeTypesMetaInformation] = List(...) scala> val types = codeTypesMetaInfos.flatMap { _.qualifiedTypes }.distinct types: Seq[ch.usi.inf.reveal.parsing.model.java.ReferenceTypeNode] = List(...)