Welcome!

Recurring Revenue Authors: Xenia von Wedel, Elizabeth White, Karthick Viswanathan, Liz McMillan, Yeshim Deniz

Related Topics: @CloudExpo, Containers Expo Blog, @DXWorldExpo

@CloudExpo: Article

Findings on Database Management | @CloudExpo #Cloud #IoT #BigData

Technical decisions around data persistence are hard, which is why we surveyed 583 IT professionals

Technical decisions around data persistence are hard, which is why we surveyed 583 IT professionals on everything from current DBMS and ORM usage to modern database engines' data structures and access patterns to storing data on a mobile device.

The demographics of this survey are as follows:

  • 69% of these respondents use Java as their primary programming language at work.
  • 68% develop primarily web applications.
  • 66% have been IT professionals for over 10 years.
  • 45% work at companies whose headquarters are located in Europe, 27% in the USA.
  • 44% work at companies with more than 500 employees, 19% at companies with more than 10,000 employees.

Give the key findings below a read and let us know what you think.

Oracle, MySQL, and SQL Server Remain Head And Shoulders Above the Rest; Oracle and MySQL Remain Neck-and-Neck
The two most mature commercial DBMS offerings (Oracle and MySQL) are used in production by 51% and 49% of respondents, respectively-significantly ahead of the third-ranked DBMS (SQL Server, at 34%). The top three, and the tight race between the top two, have not changed in years, among our survey respondents as well as on the DBMS ranking aggregator dbengines.com. The nearest NoSQL challenger, MongoDB, remains a distant fourth in production environments.

NoSQL - Especially Document-Oriented - DBMS Adoption Is Significantly Greater in Nonproduction Environments
Non-production environments are more friendly to less mature and less thoroughly supported database management systems and also more likely to be affected by desire to optimize for structural fit and ease of access. In production, where data stores are often managed by specialist non-developers, factors other than developer experience and optimal match between data processing and storage and retrieval algorithms weigh into DBMS selection more heavily. NoSQL and generally less mature and/or less supported offerings should therefore be more popular in non-production environments. DBMSes that implement simpler storage models well-suited to lightweight prototyping - especially, therefore, document-oriented DBMSes - should gain an extra boost in development environments.

Accordingly, the gap between production and non-production usage is greatest for the two most mature commercial DBMS offerings: Oracle (at 51% in production vs. 37% in nonproduction) and SQL Server (34% in production vs. 25% in non-production), and the gap between the most popular NoSQL offering in production (MongoDB, at 20% adoption) and the least popular of the top three (SQL Server, at 34%) enters within the survey's margin of error in non-production environments (where MongoDB enjoys 25.4% adoption vs. SQL Server's 24.6%).

MongoDB's (static-schema-free) document orientation, familiar JSON-like document format (ordered lists supporting a variety of types), and widespread connector availability make it easy to set up without heavyweight data modeling and relatively straightforward to use for many less-data-intensive applications without cramping application architecture or code. Indeed, many non-relational stores are easier to spin up quickly than a full-power RDBMS. Some benefits of the relational model (especially integrity enforcement) are less relevant in nonproduction environments, where updates don't always need to propagate across all entities.

Note: of the top three DBMSes, only MySQL enjoys greater adoption in non-production vs. production environments. MySQL is especially likely to be many developers' default nonproduction RDBMS, presumably because it is popular, open source, mature, familiar, and supported by a strong community. (For the importance of familiarity in developers' preference for a particular data persistence technology, see the upcoming section on matching storage model to data structure.)

Applications Are Almost as Likely to Use Two Storage Models as One
Developers may use more than one storage model in different applications with no reference to the work done by the application; variety by developer speaks more about the human than about the technology. But variety of storage models within a single application indicates "polyglot" persistence - that is, how many storage models are used to persist data where technical and business needs overlap. Among our respondents, nearly as many respondents typically use two storage models in their applications (38%) as use one (40%). This result confirms that "NoSQL" is better understood as "Not Only SQL" because the most popular storage model (given DBMS and query language usage data) remains relational. Based on DBMS adoption data, the second most popular storage model by user count is probably document-oriented; but because other storage models (especially column-oriented, graph, and key-value) are particularly well suited to analytical processing of many data rows, further research is required to discover storage model usage by data volume. In any case, the near-parity between one and two storage models per application indicates increasing interest in matching persistence mechanism to the structures of data to be persisted.

Matching Storage Model to Data Structure: Modeling Graph Data
Graph structures do not fit the relational model comfortably. In a relational database, most (Shannon) information is stored in the columns and rows of each table; the schema is a technical construct designed to enforce data integrity, make the data model more legible, and make the querying model more efficient; not to encode more information. In a graph, however, most information is stored in the structure of the nodes and the edges; additional information about nodes and edges is treated as metadata. Yet many real-world entities are most naturally represented as graphs: social, travel, and trade networks; packet routes; control flows; etc. Storing graph structures in tabular storage is inelegant and inefficient even at first, static only glance; but the problem gets worse in a dynamic setting. Because a graph's computational complexity may diverge wildly from its combinatorial complexity, reducing a graph to a relational schema (e.g.,  two-column mapping tables that relate a row in one table to a row in another - that is, modeling nodes as columns and edges as rows in a new table) may work far better for some algorithms than for others (in ways that are not immediately obvious from the graph itself).

Nevertheless, three factors encourage developers and DBAs to store data that is naturally modeled as a graph in a relational DBMS: first, the maturity of relational DBMSes; second, the simplicity and familiarity of SQL (which 90% of respondents use regularly); and third, the availability and maturity of powerful object-relational mappers (ORMs) that make relational data easily accessible (often with automatic and highly effective optimizations) from application code.

Accordingly, only a small minority (20%) of respondents persist data that is naturally modeled as a graph in a specialized graph DBMS. Further, more respondents store naturally-graph data in a relational database without explicit modeling of edges as rows (39%) than with node-node mapping tables (31%). We expect this distribution to change as graph DBMSes and query languages grow more familiar, as tooling ecosystem around these DBMSes approaches the maturity of ORMs, as inefficiencies introduced by storage-structure mismatch grow more expensive as graph data volume increases, and as use cases (and corresponding storage and retrieval algorithms) grow more varied.

Two possibly linked correlations are also worth noting. First, the largest chunk of respondents who store graphs in a relational database without explicit modeling of edges use Oracle (25%)-probably the most mature and most thoroughly optimized RDBMS. Second, the largest chunk of respondents who store graphs in relational database WITH node-node mapping tables use MySQL (24%), which is also the only RDBMS that gains popularity in non-production vs. production environments. This difference may be a function of both the greater likelihood that MySQL will be used for experimental purposes - where graph problems, insofar as conceptually farther from actuarial use (for which relational databases are a more natural fit), are more likely to appear.

Matching Processing Approach to Storage Model: Use and Enjoyment of Orms
Most developers use SQL (90%) but the relational algebra does not naturally capture object-orientation. Objects do not fall into Venn diagrams; but objects and relational tables do share enough structure that, for many simple (few-join) access patterns, the so-called object-relational impedance mismatch does not cause catastrophic performance or integrity loss. Accordingly, object-relational mappers (ORMs) are not only widely used, but also preferred by a majority of developers. In response to our question, "What persistence-related technology do you most enjoy working with?" 58% of respondents answered that they most enjoy working with ORMs. Of these, 70% specifically enjoyed working with Hibernate -  probably a function of both Hibernate's maturity and also our respondents' heavy focus on Java. Although the tail of most-enjoyed data persistence technologies was quite long (26 distinct technologies), Spring Data emerged as the most popular comprehensive data access framework by far (16%).

Reasons Developers Enjoy Working with a Data Persistence Technology
Just under two-thirds of all respondents who named the persistence-related technologies they enjoy working with also specified why they enjoyed working with those technologies. Grounded-theoretic "bucketing" analysis yielded seven (somewhat overlapping) reasons to enjoy a persistence technology (listed in order of popularity): ease of use, simplicity, adherence to standards, familiarity, performance, high level of control, and scalability. The most popular reason by far was ease of use (34%), followed by simplicity in distant second (21%). The top four reasons relate more directly to developer experience than to outcomes (such as performance and scalability), as the wording of the question ("enjoy") indicated. Additional research is required to determine how developer experience relates to persistence-related technology selection, especially because many less-familiar (NoSQL) technologies are optimized for scalability and general performance for certain use cases.

Handling Scale: Data Is Partitioned as Frequently as It Is Not, But This Is Often Successfully Made Invisible to Developers
Modern storage engines, across all storage models, are highly optimized for current hardware, access patterns, and network performance. Theoretically massive inefficiencies of the relational storage model sometimes dominate the advantages offered by a higher degree of maturity among RDBMSes, although newer engines store data in structures that are less narrowly tuned to read-heavy loads using slow (spinning) physical media than (for example) B+ trees. But as Big Data strategies aggressively drive data storage and processing needs, data scale becomes increasingly difficult to manage.

To keep performance and availability high, data is often partitioned on physical and logical lines. Among our survey respondents, 38% partition data in some way (vertical, horizontal, or functional) vs. 40% who do not - a difference within the survey's margin of error (5%). Two research follow-ups would prove interesting: first, what specific data volumes (or velocities), application requirements, and infrastructure constraints drive what kinds of partitioning; and second, which storage models are more likely to require partitioning (although application constraints presumably affect both choice of storage model and partition size/need). It would appear, however, that distributed data techniques designed to manage CAP trade-offs are often effective: 22% of respondents - most of whom are developers and not DBAs -  were not even aware of whether or not their databases were partitioned - a sign that, for nearly a quarter of developers, physical splitting of data had no visible impact on their development work.

For more information on Database and Data Persistence Tools and Techniques, please visit: https://dzone.com/guides/data-persistence-2

More Stories By John Esposito

John Esposito is Editor-in-Chief at DZone, having recently finished a doctoral program in Classics from the University of North Carolina. In a previous life he was a VBA and Force.com developer, DBA, and network administrator. John enjoys playing piano and looking at diagrams, and raises two cats with his wife, Sarah.

@ThingsExpo Stories
BnkToTheFuture.com is the largest online investment platform for investing in FinTech, Bitcoin and Blockchain companies. We believe the future of finance looks very different from the past and we aim to invest and provide trading opportunities for qualifying investors that want to build a portfolio in the sector in compliance with international financial regulations.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Imagine if you will, a retail floor so densely packed with sensors that they can pick up the movements of insects scurrying across a store aisle. Or a component of a piece of factory equipment so well-instrumented that its digital twin provides resolution down to the micrometer.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
We are given a desktop platform with Java 8 or Java 9 installed and seek to find a way to deploy high-performance Java applications that use Java 3D and/or Jogl without having to run an installer. We are subject to the constraint that the applications be signed and deployed so that they can be run in a trusted environment (i.e., outside of the sandbox). Further, we seek to do this in a way that does not depend on bundling a JRE with our applications, as this makes downloads and installations rat...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
The IoT Will Grow: In what might be the most obvious prediction of the decade, the IoT will continue to expand next year, with more and more devices coming online every single day. What isn’t so obvious about this prediction: where that growth will occur. The retail, healthcare, and industrial/supply chain industries will likely see the greatest growth. Forrester Research has predicted the IoT will become “the backbone” of customer value as it continues to grow. It is no surprise that retail is ...