Tuesday, July 5, 2022
HomeBig DataInformation integrity – A priority of data-driven organizations

Information integrity – A priority of data-driven organizations


Within the InsideView Alignment Report 2020, greater than 70% of income leaders rank knowledge administration as the best precedence. Though, many organizations have applied a system for knowledge assortment and evaluation, nonetheless, their greatest concern stays to take care of the integrity of their knowledge.

The time period ‘knowledge integrity’ is typically used as a course of or a state of knowledge. Both method, it refers to knowledge being correct, legitimate, and constant, throughout all knowledge sources.

In layman phrases, knowledge integrity refers back to the knowledge that your staff can belief, really feel assured that’s protected, and use for no matter function they need, with out worrying about knowledge high quality.

These facets are extraordinarily necessary, particularly for knowledge analysts that combine and convey knowledge collectively knowledge from a number of sources to derive helpful insights and to retain prospects. 

Varieties of knowledge integrity 

Information integrity has varied facets, however at a high-level, it may be divided into two sorts: bodily and logical. Each of those sorts outline numerous strategies and constraints that implement integrity in datasets.

Bodily knowledge integrity 

Bodily knowledge integrity pertains to defending knowledge in opposition to exterior or bodily calamities, similar to energy outages, pure disasters, hackers, and so on. These issues make it unattainable for customers to entry knowledge from the database, and are normally triggered by human errors, storage reductions, safety breaches, malware, and so on.

Logical knowledge integrity 

Logical knowledge integrity pertains to how the knowledge is saved and modelled inside the database, and the logical constraints applied to maintain the knowledge correct, legitimate, and constant throughout a number of sources.

Logical knowledge integrity is additional divided into 4 sorts:

Entity integrity 

Entity integrity means uniquely figuring out every entity in your database. This helps to keep away from duplicate information since each new document will need to have a singular identifier. These identifiers – additionally known as major keys in relational databases – can’t be null and is normally referenced in different datasets to forestall knowledge duplication. For instance, in a buyer database, SSN can be utilized because the distinctive identifier that ensures the entity integrity of the dataset.

Within the absence of uniquely figuring out attributes, advanced knowledge matching and fuzzy matching algorithms are required to match knowledge precisely and discover out which information belong to the identical entity.

Referential integrity 

Referential integrity refers back to the presence of international keys in a relational database. International keys are created to discuss with an current entity in one other desk. Relating information on this method avoids creating duplicate document entries, and utilizing info from an current schema desk. For instance, an worker database can have worker info in a single desk, and job function info in one other. And international keys are used to narrate/point out a job function within the worker info desk.

An necessary factor to notice right here is {that a} desk has a singular, not-null listing of major keys, however a number of distinctive information can share the identical international key (as a number of individuals can have the identical job function within the group).

Area integrity 

Area integrity means right (domain-specific) values have been utilized in every column of the database. For instance, in an worker database the place tackle info is added, the column Nation can have an inventory of potential values, and any worth that doesn’t fall in that listing is wrong and have to be up to date and applied in a typical format(may be finished by tackle standardization).

Person-defined integrity 

When customers outline their very own customized guidelines or constraints on a column, it’s termed as user-defined integrity. For instance, if a person defines that the lead supply for prospects database may be: Google Adwords, Web site, or Chilly Name, then a price outdoors of those three might be invalid.

Threats to knowledge integrity 

The phrases ‘knowledge integrity’ and ‘knowledge safety’ are used interchangeably however they aren’t the identical. Information safety measures are carried out to realize knowledge integrity. Furthermore, sustaining knowledge integrity is a posh activity and knowledge safety is one strategy to obtain knowledge integrity, however there are extra, since numerous issues pose as threats to knowledge integrity. A couple of of them are highlighted under:

Human error 

Each 400 per 10,000 entries in a database are inaccurate as a result of human error. It is a vital quantity, and the implementation of distinctive identifiers, integrity constraints, and different validation checks are all overridden as a result of human errors.

Inconsistencies in knowledge codecs 

With out correct knowledge codecs and kinds outlined, knowledge values inside the identical column are saved utilizing a special sample and format, which results in inconsistencies within the database. To forestall such inconsistencies, you will need to outline validation patterns and the right knowledge sorts.

Integration error 

Whereas integrating knowledge from a number of sources, knowledge integrity is normally compromised. The reason is the distinction in knowledge construction, validation checks, and integrity constraints throughout every supply. One knowledge may be saving Cellphone Quantity as char knowledge sort with max char restrict of 15, whereas the opposite is saving it as quantity knowledge sort, with max char restrict of 13.

Inner privateness breaches 

This normally occurs when your knowledge lands within the flawed arms, both an worker who’s misusing the corporate’s knowledge repository, or hackers making an attempt to interrupt by way of your firewall to get to the knowledge. In any case, securing the knowledge from such privateness breaches is a crucial activity.

Indicators of knowledge integrity 

To grasp whether or not your knowledge has integrity, it’s good to search for the next indicators:


Is your knowledge current on the proper place and is accessible at any time when wanted on the proper time? If there isn’t any correct or easy accessibility to your knowledge, then your knowledge may be on the danger of dropping its integrity. Quicker and optimized retrieval of knowledge from database is a key signal that the info’s integrity is being maintained.


Do values of a column in your dataset have the identical knowledge sort and format? Legitimate knowledge is definitely seen by noticing what number of values in your database don’t conform to applicable validation checks, for instance, a creation date for a document having a price ‘21.21.21’. The day and yr 21 make sense, however the month is invalid.


Does your database comprise loads of null values? In case your dataset doesn’t have the document of sure values, then it’s higher to decide on a generic non-null time period (similar to Not supplied or N/A), reasonably than leaving the column values empty. This may assist you to to grasp whether or not the values are lacking or incomplete, or had been intentionally left empty.


Do your dataset information uniquely establish entities? That is seen by assessing whether or not all information within the database replicate a singular identification and one entity info doesn’t span a number of information. In case your dataset accommodates duplicates, you will have to make use of knowledge matching algorithms to establish which information belong to the identical entity. In case duplicates are non-exact, you could require a mix of fuzzy matching algorithms to compute match confidence ranges, and decide.

Eventualities – the place knowledge integrity turns into essential 

Now that we’ve got lined the fundamentals of knowledge integrity, let’s talk about the real-world situations the place knowledge integrity performs a big function.

Masking personally identifiable info 

A standard observe to cover personally identifiable info is to masks precise knowledge with dummy knowledge. This course of is extensively utilized in healthcare and different authorities institutes to guard particular person privateness. If knowledge integrity isn’t maintained throughout the dataset, it may very well be very troublesome to realize the precise knowledge again from the dummy knowledge, because the authentic knowledge was inaccurate.

Guaranteeing compliance with knowledge requirements 

Compliance requirements, similar to HIPPA, GDPR, and so on. state the significance of knowledge integrity. For instance, GDPR Article 5(1) states that non-public knowledge needs to be:

“Correct and, the place mandatory, stored updated; each cheap step have to be taken to make sure that private knowledge which might be inaccurate, having regard to the needs for which they’re processed, are erased or rectified immediately;”

This clearly depicts how essential it’s to take care of knowledge integrity in your database for complying to mandatory requirements.

Driving enterprise intelligence 

Dependable knowledge insights are the largest advantage of capturing knowledge and sustaining it in a clear, standardized method. Information analysts spend 80% of their time managing knowledge and correcting knowledge errors and solely 20% of the time in really analyzing and deriving insights from it. Organizations are more and more using programs that course of their knowledge and provides descriptive insights, however nonetheless discovering it troublesome to belief these insights. The rationale goes again to how the knowledge is being captured, structured, and associated throughout tables within the database.

Conclusion – knowledge integrity produces dependable insights 

On this article, we lined fundamental and superior facets of knowledge integrity, in addition to talked about a couple of situations the place knowledge integrity turns into essential. Though sustaining the integrity of your knowledge looks like a resource- and time-intensive initiative, but it surely saves you time within the longer run, as your knowledge-driven insights grow to be extra dependable, correct, and actionable.


The put up Information integrity – A priority of data-driven organizations appeared first on Datafloq.



Most Popular

Recent Comments