1. Getting Started with the edX Data Package#
The following sections describe how to get started with the edX data package, including an overview of the data package contents and the main steps for getting access to it.
The edX data package is a collection of data generated from courses and learner activities in courses. To get access to the data package covering its courses, a partner institution must have an agreement with edX that includes data access.
The purpose of the edX data package is to enable research to understand how learners use edX, measure learning, and analyze results of experiments. It can also help course teams understand how well a course is working while it is in progress and possibly make changes in mid-course. The data package also includes lists of email addresses of enrolled learners who have consented to receive emails, supporting contact between course staff and those enrolled learners.
The data package includes the following categories of information.
The event log (also known as the clickstream)
Course content exports
Courseware database exports
Forums database exports
Open response database exports
Email opt-in data
The edX data package consists of a set of compressed and encrypted files that contain event logs and database snapshots for all of your organization’s edx.org and edge.edx.org courses. These files can be downloaded by data czars from Amazon S3.
In addition to the edX data package that is available to data czars, course-specific data is also available to the members of individual course teams. Users who are assigned the Admin or Staff role for the course can view and download data from the instructor dashboard in their live courses and from edX Insights. The data available to course teams from these applications is a subset of the data available in the data packages. For more information, see Building and Running an edX Course and Overview of EdX Insights.
In order to get access to the edX data package, a partner institution must have an agreement with edX that includes data access. In addition, the partner must appoint a data czar, who acts as a trusted point of contact between edX and the partner with regard to the data package. For more information, see Responsibilities of the Data Czar and Team.
The data package is generated by edX and stored securely on the Amazon S3 service. The data czar has the credentials to download and decrypt edX data packages. The data czar is responsible for transferring data securely to researchers and other interested parties after it is received.
In addition to the edX data package, edX makes available data through the research data exchange (RDX). The research data exchange is a mutual data exchange among edX partners; only those edX partners who choose to participate in RDX contribute data to the program, and only researchers at those institutions can request data from the program.
Researchers at participating partner institutions must propose, and be approved for, a specific educational research project to receive RDX data. For more information, see Using the Research Data Exchange Data Package.
Here is an overview of the main steps that a partner institution must perform to get access to the edX data package.
Enter into an agreement with edX for data package access. Your institution may already have such an agreement. For more information, contact your edX partner manager.
Select a data czar at the partner institution. A data czar is the representative at a partner institution who has direct access to the data package. For more information, see Responsibilities of the Data Czar and Team.
The data czar creates a public/private key pair for securely transferring files and sends the public key to edX. For more information, see Keys and Credentials for Data Transfers.
EdX sends the data czar an encrypted file that includes credentials for accessing the edX data package on Amazon S3.
The data czar uses these credentials to Access Amazon S3 and download the edX data package.