Advancing open data
Microsoft aims to close the data divide and help organizations of all sizes to realize the benefits of data and the new technologies it powers.
Skip featured stories and announcements slideshow: navigate using the slide tabsPausePrevious
Data for Society

View a collection of open datasets from Microsoft and how they’re being used to address societal challenges.
End of featured stories and announcements slideshow: navigate using the slide tabs section
Our approach
Microsoft believes everyone can benefit with collaboration around open and available data.
Enable open innovation
We’re working to promote open innovation and data governance approaches that empower data users and providers to collaborate and create value.
Learn how we’re making intellectual property a force enabler
Build partnerships for greater impact
We believe success requires partners—industry, government, and civil society around the world. Together, we promote greater access to data to benefit society and bridge the data gap.
About the Open Data Institute Explore the Open Data Policy Lab
Make data sharing easier
We’re committed to investing in the essential assets that will make data sharing easier, including the necessary tools; frameworks; and templates. This is especially important when it comes to opening and collaborating around data to solve important societal issues.
Explore the Data for Society resource center Read the Open Data for Social Impact Framework
Accelerating access to data
Access to data is a big challenge. We partner with industry and open data leaders to advance open data access and private sector data sharing for societal benefit.
Open Data Initiatives for AI
How Microsoft is supporting the Institutional Data Initiative and CORE to expand access to high-quality data for AI innovation.
Government open data commons
A technical guide for governments to establish high-quality and beneficial data commons.
The open data opportunity
The importance behind data sharing explained.
Microsoft Data for Society catalog
Explore datasets, use cases, and more in our Microsoft Data for Society repository.
Visit Data for Society on GitHub
Equity and inclusionSustainabilityHealth
BankNote-Net
Worldwide millions of people have low or no vision. BankNote-Net was created as an open dataset for assistive universal currency recognition to help with daily tasks such as currency recognition.
Explore BankNote-Net on GitHub
United States broadband usage dataset
Broadband internet access is critical to providing communities with education, employment, and telecare. The broadband usage percentages dataset shows broadband access at the US county-level to help address gaps in service availability.
Explore broadband data on Github
MS-ASL American Sign Language (ASL) dataset
In the US, over 500,000 people use ASL for communication. This ASL dataset of over 25,000 annotated videos with sign and action recognition can help researchers build machine learning models to advance sign language recognition.
Tagged hands dataset
Development of a rich hand-gesture-based interface is currently a tedious process. This dataset of 3,500 labeled depth frames of various hand poses and 140 gesture clips helps enable easy development of a gesture-based interface.
Explore the hand gestures project
Generative Neural Visual Artist (GeNeVA)
Intelligent systems can generate images and video for a range of applications, from education to accessibility. This dataset has sequences of images, associated instructions and linguistic feedback, and a modified version of the Compositional Language and Elementary Visual Reasoning (CLEVR) dataset.
Explore the GeNeVA project Read GeNeVA publication
Learning from analog pen use to improve digital ink experiences
To help researchers understand the gaps between analog versus digital pens and improve digital experiences, this dataset contains 493 entries of a diary study with 26 participants using analog pens and 178 entries from 30 participants using digital pens.
Microsoft Machine Reading Comprehension (MS MARCO)
AI and automated assistants need strong machine reading comprehension (MRC) and question answering (QA) capabilities to understand real-world dialog. This dataset contains 1,010,916 questions and 182,669 answers to improve QA and MRC.
Explore the MS MARCO project Read MS MARCO publication
Digital Civility Gender Equality Dataset
Microsoft recognizes the importance of advocating for and advancing the release of gender disaggregated data to realize gender equality and to close the data divide. This dataset can be leverage by researchers and organizations to advance better gender data policies and solutions.
Explore gender equality dataset on GitHub

Microsoft Nonprofit Innovation Hub
The Nonprofit Innovation Hub is an open-source GitHub repository with lightweight solutions that enable nonprofits to innovate.
Legal frameworks
Data sharing agreements can take months to draw up, oftentimes deterring organizations from sharing data at all. As a first step toward building better processes and tools, we’re sharing a set of data agreements to govern the sharing of data, particularly in the context of training AI models.
CDLA Permissive 2.0
The Community Data License Agreement (CDLA) Permissive 2.0 is an open data agreement designed to make it easier to share and collaborate with open data.
Read the CDLA Get more details
C-UDA 1.0
The Computational Use of Data Agreement (C-UDA) 1.0 is intended for use with datasets that may include material not owned by the data provider, but where it may have been assembled lawfully from publicly accessible sources.
Read the C-UDA See the annotated agreement Find the agreement on GitHub
DUA-OAI
The Data Use Agreement for Open AI Model Development (DUA-OAI) provides terms to govern the sharing of data by an organization with another for the purpose of allowing that second organization to use the data to train an AI model, where the trained model is open sourced.
Read the DUA-OAI Find the annotated agreement Get the details
DUA-DC
The Data Use Agreement for Data Commons (DUA-DC) can be used by multiple parties who want to share data through a common, Application Programming Interface (API)-enabled database.
Read the DUA-DC Get the annotated agreement Find out more
Capabilities
Learn more about the tools and practices we employ to enable more secure and streamlined access to data.
Differential privacy
Differential privacy introduces statistical noise–slight alterations–to mask datasets and protect the privacy of individuals.
Learn about differential privacy
Azure confidential computing
Confidential computing helps to protect sensitive data in the cloud by offering security through data-in-use encryption–additional protection for your data while it’s being processed.
Read about Azure confidential computing
Azure Open Datasets
A curated collection of publicly available datasets that are ready to use in machine learning workflows and easy to access from Azure services.
Review the Azure Open Datasets
Researcher tools
Explore a collection of datasets, code, and models from Microsoft Research for the broader academic community to advance state-of-the-art research across all disciplines.
Follow Microsoft

Leave a Reply