2. Responsibilities of the Data Czar and Team#

A data czar is the single representative at a partner institution who has the credentials to download and decrypt edX data packages. The data czar is responsible for transferring data securely to researchers and other interested parties after it is received. Due to the sensitivity of this data, the responsibility for these activities is restricted to one individual.

At each partner institution, the data czar is the primary point of contact for information about edX data. At some institutions, only the data czar works on research projects that use the course data in edX data packages. At other institutions, the dataczar works with a team of additional contributors, or is responsible only formaking a secure transfer of the data to the research team. Typically, the data team includes members in the following roles (or a data czar with these skill sets).

  • Database administrators work with the SQL and NoSQL data files and write queries on the data.

  • Statisticians and data analysts mine the data.

  • Educational researchers pose questions and interpret the results of queries on the data.

For more information, see Skills and Experience of Other Team Members.

All of the individuals who are permitted to access the data should be trained in, and comply with, their institution’s secure data handling protocols. For more information about data security policies and procedures, see Data Security Guidelines for Data Czars.

2.1. Skills and Experience of Data Czars#

The individuals who are selected by a partner institution to be edX data czars typically have experience working with sensitive student data, are familiar with encryption, decryption, and file transfer protocols, and can validate, copy, move, and store large files.

Depending on the size of your institution and the number of contributors in the research team, the data czar might need to be a qualified research and analytics team lead, a senior research manager, or direct the research and analytics program.

The data czar is responsible for ensuring compliance with your institution’s and country’s regulations with respect to the sharing of this data.

2.1.1. General Skills#

  • Ability to set up and manage data access.

  • Knowledgeable about general data privacy and security best practices.

  • Experience with managing sensitive student data.

2.1.2. Technical Skills#

  • Familiarity with PGP™ security software and GPG encryption and decryption.

  • Ability to download large files from Amazon Simple Storage Service™ (Amazon S3™).

  • Ability to set up a secure internal data distribution pipeline and run scripts to download files in bulk from Amazon S3.

  • Experience working with archive files in TAR, GZ, and ZIP formats.

  • Familiarity with SQL and noSQL databases.

  • Familiarity with CSV and JSON file formats.

  • Experience copying, moving, and storing large files in bulk.

  • Ability to validate the data and files received and distributed.

  • Ability to run scripts to process large files and do data analysis.

2.2. Skills and Experience of Other Team Members#

In addition to the data czar, each partner institution assembles a team of contributors to carry out their research projects. This team might include database administrators, software engineers, data specialists, and educational researchers. The team can be large or small, but collectively its members need to be able to work with SQL and NoSQL databases, write queries, and convert the data from raw formats into standard research packages, such as CSV files, spreadsheets, or other desired formats.

2.2.1. General Skills#

  • Attention to detail.

  • Experience setting up and testing a data conversion pipeline.

  • Ability to identify interesting features in a complex and rich data set.

  • Familiarity with anonymization and obfuscation techniques.

  • Familiarity with data privacy and security best practices.

  • Experience with managing sensitive student data.

2.2.2. Technical Skills#

  • Familiarity with CSV files, MongoDB® collections, JSON documents, Unicode, XML, and HTML.

  • Ability to set up, query, and administer both SQL and noSQL databases.

  • Experience with bash and other command line scripts.

  • Basic or advanced scripting (for example, using the Python or Ruby programming language) to convert, join, and aggregate data from different data sources, handle JSON serialization, and Unicode specificities.

  • Experience with data mining and data aggregation across a rich, varied data set.

  • Ability to write parsing scripts that properly handle JSON serialization and Unicode.

2.3. Resources for Data Czars and Teams#

For discussions, edX hosts the openedx-analytics Google Group, which is open to the public. For more information about this and other resources, see Resources for Data Czars and Researchers.