The following sections describe how to get started with the edX data package,
including an overview of the data package contents and the main steps for
getting access to it.
The edX data package is a collection of data generated from courses and learner
activities in courses. To get access to the data package covering its courses,
a partner institution must have an agreement with edX that includes data
access.
The purpose of the edX data package is to enable research to understand how
learners use edX, measure learning, and analyze results of experiments. It can
also help course teams understand how well a course is working while it is in
progress and possibly make changes in mid-course. The data package also
includes lists of email addresses of enrolled learners who have consented to
receive emails, supporting contact between course staff and those enrolled
learners.
The data package includes the following categories of information.
The event log (also known as the clickstream)
Course content exports
Courseware database exports
Forums database exports
Open response database exports
Email opt-in data
The edX data package consists of a set of compressed and encrypted files that
contain event logs and database snapshots for all of your organization’s
edx.org and edge.edx.org courses. These files can be downloaded by data czars
from Amazon S3.
In addition to the edX data package that is available to data czars,
course-specific data is also available to the members of individual course
teams. Users who are assigned the Admin or Staff role for the course can view
and download data from the instructor dashboard in their live courses and from
edX Insights. The data available to course teams from these applications is a
subset of the data available in the data packages. For more information, see
Building and Running an edX Course and Overview of EdX Insights.
In order to get access to the edX data package, a partner institution must have
an agreement with edX that includes data access. In addition, the partner must
appoint a data czar, who acts as a trusted point of contact between edX and
the partner with regard to the data package. For more information, see
Responsibilities of the Data Czar and Team.
The data package is generated by edX and stored securely on the Amazon S3
service. The data czar has the credentials to download and decrypt edX data
packages. The data czar is responsible for transferring data securely to
researchers and other interested parties after it is received.
In addition to the edX data package, edX makes available data through the
research data exchange (RDX). The research data exchange is a mutual data
exchange among edX partners; only those edX partners who choose to participate
in RDX contribute data to the program, and only researchers at those
institutions can request data from the program.
Researchers at participating partner institutions must propose, and be approved
for, a specific educational research project to receive RDX data. For more
information, see Using the Research Data Exchange Data Package.
Here is an overview of the main steps that a partner institution must perform
to get access to the edX data package.
Enter into an agreement with edX for data package access. Your institution
may already have such an agreement. For more information, contact your edX
partner manager.
Select a data czar at the partner institution. A data czar is the
representative at a partner institution who has direct access to the data
package. For more information, see Responsibilities of the Data Czar and Team.
The data czar creates a public/private key pair for securely transferring
files and sends the public key to edX. For more information, see
Keys and Credentials for Data Transfers.
EdX sends the data czar an encrypted file that includes credentials for
accessing the edX data package on Amazon S3.
The data czar uses these credentials to Access Amazon S3 and
download the edX data package.