In May, Microsoft announced plans to retire the Microsoft Academic Graph (MAG) at the end of 2021. The contributions from Microsoft Research to open scholarly metadata have been immense and the retirement of MAG will have a significant impact on the research community as well as the many apps and services that have been enabled by Microsoft’s openly-licensed knowledge graph.
At The Lens, we are grateful for the long-term, collaborative partnership we have had with the Microsoft Research team. There is an enormous respect for what the team has built, The Lens is proud to have been a principal collaborator of Microsoft Academic for years, and the only organisation to contribute data to the MAG project in the form of tens of millions of global patents and metadata.
The open metadata provided by MAG has given rise to an ecosystem of apps and services that rely on MAG and will be impacted with its retirement. These include several scholarly abstracting and indexing platforms, knowledge and citation graphs, and other community-supported infrastructure including The Lens.
We are grateful for the significant amount of interest from the community about our post-MAG plans, including questions about roles that The Lens might play in a new resource. In this post we cover the next steps for Scholarly Works in The Lens after the discontinuation of MAG.
What are the unique contributions of MAG?
MAG has often been used to enrich more traditional metadata sources, such as Crossref, because of the additional content (e.g. scholarly works without a DOI) and metadata coverage provided by MAG. There have been a number of discussions of the implications of MAG’s retirement and what will be lost as a result (e.g. Why open alone is not enough, What comes after MAG?, What do we lose when MAG goes away?, Microsoft Academic Graph is being discontinued. What’s next?), which outline the key contributions from MAG.
While The Lens MetaRecord strategy is not reliant on any single data source and MAG data will not be lost after , there are 5 unique contributions from MAG that cannot be easily replaced by existing data sources. These include:
- Additional content: MAG leverages the capabilities of the Bing search engine to discover and harvest metadata for many additional scholarly works that are not available through other platforms such as traditional DOI registration agencies. This additional content includes research datasets, conference publications, reports and other grey literature not typically included in traditional scholarly metadata aggregators such as DOI registration agencies.
- Entity disambiguation: The Microsoft Research team utilises machine learning to disambiguate authors and institutions, assigning unique identifiers to each, and enabling enhanced discovery and analysis in downstream platforms.
- Fields of Study: A very valuable component of the MAG dataset is the dynamic Fields of Study taxonomy for categorising research output from across the disciplinary spectrum.
- Citations: Microsoft Research’s machine learning capabilities have resulted in the largest open citation graph available. MAG’s high content coverage contributes to the increased citation coverage and helps identify more relationships between connected entities.
- Recommended works: Another benefit of Microsoft Research’s machine learning capabilities is the identification of similar works for recommended reading, and is another of MAG’s unique capabilities.
What are the plans to replace MAG?
A number of initiatives are already underway to attempt to provide a replacement for the hole left by the retirement of MAG (e.g. MAG replacement update: meet OpenAlex!, Investing in Semantic Scholar’s Knowledge Graph), while stakeholders and scholarly infrastructure groups from across the impacted community have been discussing their own post-MAG plans.
While these initiatives have the potential to replace MAG, The Lens is taking a two-pronged strategy to move beyond MAG; community engagement through Collective Action and The Lens MetaRecord. While we will engage with community groups to help wherever we can, The Lens is forging ahead with the MetaRecord strategy. This approach allows us to continue to grow The Lens MetaRecord and is well positioned to take advantage of any MAG replacement(s) from the community.
The Lens MetaRecord Strategy
The Lens MetaRecord strategy manages complexities around record variability by merging content sources and contextual metadata relevant to the original record. Complemented with a unique open persistent identifier, The Lens ID, The Lens MetaRecord creates an open, granular, and dynamic record mapping system for knowledge artefacts (e.g. patents, scholarly works) and entities (e.g. individuals and institutions).
Using The Lens MetaRecord approach, we are able to flexibly ingest any open data that adds value to the MetaRecord and the user experience. While MAG is a key data source for The Lens, the MetaRecord strategy allows for any additional open data sources to be ingested to supplement the unique content and metadata provided by MAG.
Through this approach, The Lens Scholarly MetaRecord (SMR) will continue to grow into the future given the MetaRecord structure is sufficiently resilient to changes in data sources. Regardless of the outcomes of other initiatives to provide a replacement to MAG, a key part of The Lens plans for the SMR beyond MAG includes the following:
Ingest Additional Content
As part of the SMR strategy, additional content sources will be ingested to fill the content gap that MAG’s discontinuation will leave. By seeking out complementary content sources, we can expand coverage.
In the short term, additional content will be covered by ingesting DataCite metadata. DataCite is a DOI registration agency providing persistent identifiers and metadata services for research data and other research outputs contributed from data repositories, preprint archives and publishers. As such, DataCite provides complementary content and metadata not currently covered in Crossref or PubMed making it an ideal source to help supplement the content gap that will be left by MAG.
The content strategy will then expand to regional DOI registration agencies. Beyond this, The Lens will use any content or data source that is committed to the principles of FAIR and open data to improve the coverage and quality of The Lens SMR.
By taking the MetaRecord concept and expanding it to human beings (i.e. authors, inventors) and legal entities (i.e. institutions, companies, etc.), The Lens will build on the entity disambiguation provided by MAG to create MetaRecords with persistent identifiers for individuals and institutions.
Using a similar approach as the SMR, entity MetaRecords will be created by merging entity metadata from scholarly works and patents with other existing entity data sources (e.g. ORCID, ROR, OpenCorporates, etc.) to create entity MetaRecords for both humans and institutions.
This entity MetaRecord approach will also help connect knowledge silos by disambiguating and matching the individuals and institutions represented in the author affiliations of publications, and unifying them with the individuals and institutions represented in the patent inventors, applicants, and owners. Linking these entity MetaRecords with existing scholarly works or patent MetaRecords will further enhance discovery of the connections between entities.
We have recently launched The Lens Collective Action Project (CAP) which includes opportunities for the community to contribute to collective action and help ensure the long-term viability of FAIR and open data through The Lens. Here are two ways to participate:
Open Content Initiatives
As part of CAP, we have launched an Open Content Initiative as a way of participating in collective action to enable community-supported infrastructure. The initiative is targeted at publishers in particular, to contribute to collective action by providing additional metadata or full text for indexing to improve discovery and further enhance content coverage.
The Lens is committed to collaborating with other stakeholders, scholarly infrastructure and community groups on initiatives for a replacement to MAG or to recreate aspects of MAG, for example, taxonomies, citations, recommended works or web crawling for additional content.
The Lens MetaRecord strategy has been used and refined over many years and continues to be fit for purpose. We are grateful for the active interest from The Lens community and other stakeholders following the announcement from Microsoft. As a project of the social enterprise Cambia, The Lens remains committed to our mission to make participants in the innovation ecosystem as effective and efficient as possible.