EdX discussion data is stored as collections of JSON documents in a MongoDB database. MongoDB is a document-oriented, NoSQL database system. Documentation can be found at the mongodb web site.
In the data package, discussion data is delivered in a .mongo file, identified
by organization and course, in the format
The primary collection that holds all of the discussion posts written by users is “contents”. Two different types of objects are stored, representing the three levels of interactions that users can have in a discussion.
CommentThread represents the first level of interaction: a post that
opens a new thread, often a student question of some sort.
Comment represents both the second and third levels of interaction: a
response made directly to the conversation started by a
Comment. Any further contributions made to a specific response are also
A sample of the field/value pairs that are in the .mongo file, and descriptions of the attributes that these two types of objects share and that are specific to each type, follow.
In addition to these collections, events are also emitted to track specific user activities. For more information, see Discussion Forum Events.
Two sample rows, or JSON documents, from a
.mongo file of discussion data