All data uploaded to Open Supply Hub is cleaned and deduplicated by a matching algorithm, and then assigned an industry-standard ID. This OS ID is free and accessible to anyone, meaning you can use it to easily sync your data with other organizations, platforms and service providers.
The Goal: Cleanly and accurately map every production facility in the world and assign it a free and universal ID that can be used by all. Every organization connected to a production facility has the same ID for that facility, so they can easily sync their data and collaborate.
How we’re getting there
The Algorithm:
OS Hub is powered by a set of GIS and data-matching tools, which process each line of data contributed to OS Hub to detect whether or not a facility being submitted already exists in the database.
3 Possible Outcomes:
- If the algorithm is more than 80% confident that an uploaded facility matches an existing facility in the database, it will automatically match them.
- If it is less than 50% confident that an uploaded facility matches an existing facility in the database, it will create a new facility profile and OS ID.
- If the algorithm’s confidence level in a match lies between 50 and 80%, it will send that potential match to the OS Hub data moderation team for manual review.
Assigning OS IDs:
Every facility listed in OS Hub has an OS ID. This is a 15-character unique identifier made up of 4 segments:
CN 2019067 NZ95A M
From left to right these segments are:
- A 2-character country code.
- The 4-digit year and 3-digit day of the year indicating when the OS ID was assigned.
- 5 characters that represent a random number.
- A one character “check digit” that is calculated based on the previous 14 characters and can be used to validate that an OS ID has not been mistyped or otherwise damaged.
When a facility is first created in OS Hub, it is automatically allocated its own ID. Anyone can access and make use of these IDs, for free.
OS IDs can enable interoperability between databases and act as a “clearing” ID against which multiple systems can connect. Using facility IDs, rather than names, as a central reference point eliminates confusion surrounding facility identification and helps stakeholders across sectors quickly reach an understanding of shared connections at the facility level. They can be incorporated into internal systems, such as PLMs or sourcing platforms, shared on public-facing dashboards, or displayed on customer-facing supplier lists, to demonstrate commitments to transparency in service of collaboration.
Data Moderation:
To ensure that each OS ID points to a unique and clean facility profile, alongside the automated work of OS Hub’s algorithm, the OS Hub team continually moderates data in the tool to:
- Evaluate and eliminate potential duplicates
- Promote higher quality data on facility profiles
- Update GPS coordinates
OS Hub’s technical team also runs regular training exercises to refine the algorithm and related moderation tools, to continue to create the highest quality, open dataset possible.
- Learn more about what is and isn’t considered a duplicate facility in OS Hub.
- View OS Hub’s full moderation policy.
- Spotted a duplicate in our data? Have access to more accurate GPS coordinates for a facility? Click the flag icon on a facility profile to share that with the OS Hub team.