For partners who are running courses on edx.org and edge.edx.org, edX regularly makes research data available for download from the Amazon S3 storage service. The data package that data czars download from Amazon S3 consists of a set of compressed and encrypted files that contain event logs and database snapshots for all of their organizations’ edx.org and edge.edx.org courses.
Course-specific data is also available to the members of individual course teams. Users who are assigned the Admin or Staff role for the course can view and download data from the Instructor Dashboard in their live courses and from edX Insights. The data available to course teams from these applications is a subset of the data available in the data packages. For more information, see Building and Running an edX Course and Overview.
A data package consists of different files that contain event data and database data.
Note
In all file names, the date is in {YYYY}-{MM}-{DD} format.
You download these files from different Amazon S3 “buckets” and folders. See Amazon S3 Buckets and Folders.
The {org}-{site}-events-{date}.log.gz.gpg
file contains a daily log of
course events. A separate file is available for courses running on edge.edx.org
(with “edge” for {site} in the file name) and on edx.org (with “prod” for
{site}).
For a partner organization named UniversityX, these daily files are identified
by the organization name, the edX site name, and the date. For example,
universityx-edge-events-2014-07-25.log.gz.gpg
.
Each of these compressed files can range in size from hundreds of kilobytes to tens of megabytes. When you extract a compressed file, it is approximately 20 times larger. As a result, multiple gigabytes of space might be needed to store the tracking logs for a year.
For information about the contents of these files, see Data Package Contents.
The {org}-{date}.zip
file contains views on database tables. This file
includes data as of the time of the export, for all of an organization’s
courses on both the edx.org and edge.edx.org sites. A new file is available
every week, representing the database at that point in time.
For a partner organization named UniversityX, each weekly file is identified by
the organization name and its extraction date: for example,
universityx-2013-10-27.zip
.
Compressed, these files can range in size from hundreds of megabytes to tens of gigabytes in size. When you extract a compressed file, it is approximately 20 times larger. As a result, institutions that receive data for several courses for several years might require from tens to hundreds of gigabytes of space for data storage.
For information about the contents of this file, see Data Package Contents.
Data package files are located at the following Amazon S3 destinations:
{org}-{site}-events-{date}.log.gz.gpg
files of course event data.{org}-{date}.zip
database snapshot.For information about accessing Amazon S3, see Access Amazon S3.
You download the files in your data package from the Amazon S3 storage service.
To download daily event files, use the AWS Command Line Interface or a third-party tool to connect to the s3://edx-course-data/{org} folder on Amazon S3.
For information about providing your credentials to connect to Amazon S3, see Access Amazon S3.
Navigate within s3://edx-course-data/{org} to locate the files that you want:
{org}/{site}/events/{year}
The event logs in the {year}
folder are in compressed, encrypted
files named {org}-{site}-events-{date}.log.gz.gpg
.
Download the {org}-{site}-events-{date}.log.gz.gpg
file.
If your organization has courses running on both edx.org and edge.edx.org, separate log files are available for the “prod” site and the “edge” site. Repeat this step to download the file for the other site.
Note
If you are using a third-party tool to connect to Amazon S3, you might not be able to navigate directly between the s3://course-data bucket and the s3://edx-course-data/{org} folder. You might need to disconnect from Amazon S3 and then reconnect to the other destination.
To download a weekly database data file, connect to the edX s3://course-data bucket on Amazon S3 using the AWS Command Line Interface or a third-party tool.
For information about providing your credentials to connect to Amazon S3, see Access Amazon S3.
Download the {org}-{date}.zip
database data file from the
s3://course-data bucket.
Each of the files you download contains one or more files of research data.
{org}-{site}-events-{date}.log.gz.gpg
¶The {org}-{site}-events-{date}.log.gz.gpg
file contains all event data for
courses on a single edX site for one 24-hour period. After you download a
{org}-{site}-events-{date}.log.gz.gpg
file for your institution, you:
{org}-{site}-events-{date}.log
. (Alternatively, the data can
be decompressed in stream using a tool such as gzip.)For more information about the events in this file, see Events in the Tracking Logs.
{org}-{date}.zip
¶After you download the {org}-{date}.zip
file for your
institution, you:
.gpg
, which indicates that they are encrypted.The result of extracting and decrypting the {org}-{date}.zip
file is the
following set of .sql, .csv, and .mongo files. Note that the .sql files are
tab separated.
{org}-{course}-{run}-auth_user-{site}-analytics.sql
{org}-{course}-{run}-auth_userprofile-{site}-analytics.sql
{org}-{course}-{run}-certificates_generatedcertificate-{site}-analytics.sql
{org}-{course}-{run}-course_structure-{site}-analytics.json
{org}-{course}-{run}-courseware_studentmodule-{site}-analytics.sql
{org}-email_opt_in-{site}-analytics.csv
{org}-{course}-{run}-student_courseenrollment-{site}-analytics.sql
{org}-{course}-{run}-user_api_usercoursetag-{site}-analytics.sql
{org}-{course}-{run}-user_id_map-{site}-analytics.sql
{org}-{course}-{run}-{site}.mongo
ora
Subdirectory{org}-{course}-{run}-student_anonymoususerid-prod-analytics.sql.gpg
{org}-{course}-{run}-wiki_article-{site}-analytics.sql
{org}-{course}-{run}-wiki_articlerevision-{site}-analytics.sql
{org}-{course}-{run}-auth_user-{site}-analytics.sql
¶Information about the users who are authorized to access the course. See Columns in the auth_user Table.
{org}-{course}-{run}-auth_userprofile-{site}-analytics.sql
¶Demographic data provided by users during site registration. See Columns in the auth_userprofile Table.
{org}-{course}-{run}-certificates_generatedcertificate-{site}-analytics.sql
¶The final grade and certificate status for students (populated after course completion). See Columns in the certificates_generatedcertificate Table.
{org}-{course}-{run}-course_structure-{site}-analytics.json
¶This file documents the structure of a course at a point in time. The file includes data for the course, including important dates, pages, and course- wide discussion topics. It also identifies each item of course content defined in the course outline. A separate file is included for each course on the site. For more information, see Course Content Data.
{org}-{course}-{run}-courseware_studentmodule-{site}-analytics.sql
¶The courseware state for each student, with a separate row for each item in the course content that the student accesses. No file is produced for courses that do not have any records in this table (for example, recently created courses). See Columns in the courseware_studentmodule Table.
{org}-email_opt_in-{site}-analytics.csv
¶This file reports the email preference selected by students who are enrolled in any of your institution’s courses. See Institution-wide Data.
{org}-{course}-{run}-student_courseenrollment-{site}-analytics.sql
¶The enrollment status and type of enrollment selected by each student in the course. See Columns in the student_courseenrollment Table.
{org}-{course}-{run}-user_api_usercoursetag-{site}-analytics.sql
¶Metadata that describes different types of student participation in the course. See Columns in the user_api_usercoursetag Table.
{org}-{course}-{run}-user_id_map-{site}-analytics.sql
¶A mapping of user IDs to site-wide obfuscated IDs. See Columns in the user_id_map Table.
{org}-{course}-{run}-{site}.mongo
¶The content and characteristics of course discussion interactions. See Discussion Forums Data.
ora
Subdirectory¶The ora
subdirectory contains SQL tables for data relating to any open
response assessment (ORA) problems in your organization’s courses. For more
information, see Open Response Assessment Data.
{org}-{course}-{run}-assessment_assessment-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_assessmentfeedback-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_assessmentfeedback_assessments-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_assessmentfeedback_options-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_assessmentfeedbackoption-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_assessmentpart-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_criterion-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_criterionoption-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_peerworkflow-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_peerworkflowitem-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_rubric-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_studenttrainingworkflow-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_studenttrainingworkflowitem-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_trainingexample-prod-analytics.sql.gpg
{org}-{course}-{run}-assessment_trainingexample_options_selected-prod-analytics.sql.gpg
{org}-{course}-{run}-submissions_score-prod-analytics.sql.gpg
{org}-{course}-{run}-submissions_scoresummary-prod-analytics.sql.gpg
{org}-{course}-{run}-submissions_studentitem-prod-analytics.sql.gpg
{org}-{course}-{run}-submissions_submission-prod-analytics.sql.gpg
{org}-{course}-{run}-workflow_assessmentworkflow-prod-analytics.sql.gpg
{org}-{course}-{run}-workflow_assessmentworkflowstep-prod-analytics.sql.gpg
{org}-{course}-{run}-student_anonymoususerid-prod-analytics.sql.gpg
¶A mapping of user IDs to the course specific anonymous IDs used by open response assessment tables. See Columns in the student_anonymoususerid Table.
{org}-{course}-{run}-wiki_article-{site}-analytics.sql
¶Information about the articles added to the course wiki. See Fields in the wiki_article File.
{org}-{course}-{run}-wiki_articlerevision-{site}-analytics.sql
¶Changes and deletions affecting course wiki articles. See Fields in the wiki_articlerevision File.