blog




  • Essay / Data Integration Challenges

    One of the most fundamental challenges of the data integration process is setting realistic expectations. The term data integration conjures up the perfect coordination of diverse databases, software, equipment and personnel into a smoothly functioning alliance, free from the persistent headaches that characterize less comprehensive information management systems. Think again. Say no to plagiarism. Get a tailor-made essay on “Why Violent Video Games Should Not Be Banned”? Get an Original Essay The requirements analysis stage provides one of the best opportunities in the process to recognize and digest the full extent of the complexity of the data integration task. Careful attention to this analysis is probably the most important ingredient in creating a system that will last until it is adopted and used to its fullest extent. However, as the field of data integration advances, other common obstacles and compensating solutions will be readily identified. Current integration practices have already highlighted some familiar challenges along with strategies to address them, as outlined below. Heterogeneous Data Challenges For most transportation agencies, data integration involves synchronizing Huge amounts of variable and heterogeneous data resulting from existing internal systems that vary in data format. . Legacy systems may have been built around flat file, network, or hierarchical databases, unlike new generations of databases that use relational data. Data in different formats from external sources continues to be added to existing databases to improve the value of the information. Each generation, product and system developed internally has unique requirements to satisfy for storing or retrieving data. Data integration may therefore involve various strategies to deal with heterogeneity. In some cases, this effort becomes a major exercise in data homogenization, which may not improve the quality of the data offered.StrategiesDetailed analysis of data characteristics and uses is necessary to mitigate problems associated with heterogeneous data. First, a model is chosen (federated environment or data warehouse) that meets the requirements of business applications and other uses of the data. Next, the database developer will need to ensure that various applications can use this format or, alternatively, that standard operating procedures are adopted to convert the data to another format. Bringing disparate data together into a database system or migrating and merging highly incompatible databases is a daunting task. a job that can sometimes seem like an insurmountable challenge. Fortunately, software technology has advanced to minimize obstacles through a series of data access routines that allow structured query languages ​​to access almost any DBM and data file system, relational or non-relational. Bad DataChallengesData quality is a major concern in any data integration strategy. Existing data must be cleaned before conversion and integration, otherwise an agency will almost certainly face serious data problems later. Legacy data impurities have a cumulative effect; by nature, they tend to focus on users of large volumes of data. If this information is corrupted, so will the decisions that flow from it. It is not uncommon forUndiscovered data quality issues appear during the process of cleaning information intended for use by the integrated system. The problem of bad data leads to regular audit procedures of the quality of the information used. But who holds ultimate responsibility for this work is not always clear. Strategies: The issue of data quality exists throughout the life of any data integration system. It is therefore best to establish both practices and responsibilities up front, and to plan for each to continue in perpetuity. The best processes result when developers and users work together to determine the quality controls that will be put in place during both development. phase and continued use of the system. Lack of Storage Capacity Challenges The unanticipated need for additional performance and capacity is one of the most common challenges in data integration, especially in data warehousing. Two storage requirements typically come into play: extensibility and scalability. Anticipating the scale of growth in an environment where storage requirements can increase exponentially once a system is launched raises concerns that the cost of storage will outweigh the benefits of data integration. Introducing such massive amounts of data can push the limits of hardware and software. This may require developers to implement costly patches if an architecture to handle much larger amounts of data is to be integrated into the intended system. StrategiesAlternative storage is becoming routine for data warehouses that are likely to grow in size. Planning for such options helps keep database expansion affordable. The cost per gigabyte of disk storage continues to fall as technology improves. From 2000 to 2004, for example, the cost of data storage fell tenfold. High-performance storage drives are expected to follow the downward price spiral. Unforeseen CostsChallenges Data integration costs are largely driven by elements that are difficult to quantify, and therefore predict, for the uninitiated. These may include: Labor costs for initial planning, assessment, programming and additional data acquisition. Software and hardware purchases. Unforeseen technological changes/advancements. Labor costs and direct costs of data storage and maintenance. It is important to note that regardless of efforts to streamline maintenance, the realities of a fully functioning data integration system may require much more maintenance than expected. Unrealistic estimates may be driven by an overly optimistic budget, especially in these times of budget deficits and doing more with less. More users, more analytics needs, and more complex requirements can lead to performance and capacity issues. Limited resources may result in extended project timelines, without commensurate funding. Unforeseen problems or new problems may require expensive consulting help. And the dynamic atmosphere of today's transportation agency must be taken into account, in which staff shortages, changes in business processes, hardware and software problems and change of management can lead to additional expenses. The investment in time and labor required to extract, clean, load and maintain data canbe carried out if the quality of the data presented is low. It is not uncommon for this to generate unforeseen labor costs that are quite alarmingly disproportionate to the total project budget. Strategies The approach to estimating project costs must be both far-sighted and realistic. This requires investment in experienced analysts, as well as cooperation, where possible, between sister agencies on lessons learned. Special efforts should be made to identify items that may seem unlikely but could have a significant impact on the total project cost. Extraordinary care taken in planning, investing in expertise, obtaining stakeholder buy-in and participation, and managing the process will help ensure that cost overruns are minimized and, when they occur, can be resolved as effectively as possible. Data integration is a fluid process in which such overruns can occur at every stage of the process, so trained and vigilantly monitored staff are likely to pay dividends instead of increasing costs. A viable data integration approach must recognize that the better data integration works for users, the more fundamental it will become to business processes. This level of usage must be supported by consistent maintenance. It might be tempting to think that a well-designed system will, by nature, work without much maintenance or modification. In fact, the best systems and processes tend to thrive on the routine care and support of well-trained staff, a fact that wise managers generously anticipate in the integration of plan and budget. Lack of staff cooperationChallenges User groups within an agency may have developed databases themselves, sometimes independently of information systems staff, that meet particular user needs very well. It is natural that owners of these functional self-contained units may be skeptical that the new system will meet their needs as effectively. Other proprietary interests may come into play. For example, staff in a division may not want the data they collect and track to be transparently visible to headquarters staff at all times without the ability to 'address the nuances of what the data seems to show. Owners or users may be concerned that line managers, without appreciating the particularities of a given method of operation, will gain more control over how data is collected and accessed organization-wide. In some agencies, the level of staff, consultants and financial support from higher levels of management may not be sufficient to allay these fears and secure cooperation. Top management must be fully invested in the project. Otherwise, it is less likely that the strategic data integration plan and associated resources will be approved. The additional support required to engage and convey to everyone in the agency the need for and benefits of data integration is unlikely to come from leaders who lack awareness or commitment to the benefits of integration Data.StrategiesAny large-scale data integration project, regardless of the model, requires executive management to be fully involved. Without this, the initiative may simply fail. Inform and involve the diversity of stakeholders during the.