In the recent past the Big Data buzz has disappeared, as the companies look to shift from cognizance to action. But with the sudden action emerged numerous challenges, to overcome those challenges Waterline Data came into existence. Formed in 2013, based on the realization that while “Big Data” represented a revolution in data analytics, the increase in volume, velocity and variety of data also made it difficult to find and use the valuable information buried within the massive landfill of dark data.
Revolutionizing the way of Handling Big Data
Working on the aspect of “connect the right people to the right data,” Waterline is dedicated to automating the discovery and intervening of data so that organizations can spend more time using data, and less time searching for it and better comply with data regulatory requirements thus reducing the costs associated with data redundancy and data hoarding. By better managing client’s data, Waterline helps their customers reduce the cost of data management, unlock hidden value and unleash competitive advantage.
Pushing Waterline data to Attain Success
Kaycee Lai, President and COO of Waterline Data, oversees all aspects of Waterline’s strategy, marketing, field operations, customer success and business development. Prior joining to Waterline, Kaycee served as SVP of Products & World Wide Field Operations for Primary Data. There, he helped launch DataSphere, a solution designed to manage data across various silos. In 2013, Kaycee joined VMware via its acquisition of Virsto, the technology behind VMware’s rapidly growing hyper-converged solution, vSAN. At Virsto, Kaycee helped the company shift its strategy that resulted in a 500% increase in revenue. Prior to Virsto, Kaycee was VP of WW Sales at Delphix, where he grew implemented the company’s go-to-market strategy and grew sales at a 200%+ pace for four consecutive quarters. Previously, he held various leadership roles at EMC including general manager for a $120 million business unit across several countries. Prior to EMC, Kaycee was an early pioneer of data de-duplication with Avamar Technologies, where he was part of the team that led a successful acquisition by EMC. Before Avamar, Kaycee also held various roles at Microsoft and EMC.
Rendering Tailor-made Products and Services
Waterline is well known for offering a suite of data catalog based products that connects the right people to the right data with solutions that enable self-service data analytics, data governance and data rationalization. These solutions are based on a Smart Data Catalog platform that automates the discovery, matching and tagging process and ensures that the catalog is always up to date by incrementally scanning the data itself and not just by looking at historical SQL logs.
Discover: Waterline Data Fingerprinting™ automatically & incrementally fingerprints customer data, establishing a unique signature for each attribute. It analyzes data values, profiles the data and uses that information to do a preliminary organization of your data.
Organize: Waterline automatically matches the data fingerprints to glossary terms. Unmatched terms are then matched using crowdsourcing. Once a field is tagged by an analyst, the fingerprint of that field is matched against the fingerprints of other fields and the same tag is automatically propagated.
Curate: The automated matches are reviewed by data analysts or data stewards for approval. The system learns from this process and tunes the automated matching algorithm. Sensitive and private data tags are passed directly to Apache Ranger and Cloudera Sentry to enable tag based access control.
Search, Rate and Collaborate: Business professionals can search for data and data redundancy using the Waterline Smart Data Catalog search UI. They can also access the catalog through integrated third party data wrangling or business intelligence applications, all in a controlled and governed manner.
Creating Employee Friendly Work Culture
Unlike many recent startups in Silicon Valley, Waterline Data outshines with their values, mission and vision. They believe “we value having a sense of urgency vs creating a sense of chaos; we value smart nice people over smart jerks; we value cool technology that solves real business problems over just cool technology. Our leadership team knows what it takes to build a software business, having grown and managed organizations ranging from startups to large multinationals. We value people who want to work as part of a team to solve the hard problems that will make a difference for our customers and for our business.”
Innovating across Multiple Dimensions
Innovating across multiple dimensions is their strength. The first is the ability to automate the discovery and tagging of data at scale to populate a data catalog. Doing this accurately without having too many false positives is a major technical challenge that they have successfully overcome. A second critical area of innovation is then connecting the information and collecting in the data catalog for specific use cases. Their ability to then enable data analytics self service, data governance and data rationalization comes directly from their ability to create specific value which is added on top of the catalog and then integrate the catalog with partners to provide their customers a complete end to end solution. The last area of innovation is the ability to connect and catalog an ever-increasing variety of data ranging from data stored as simple files in Hadoop, to complex json files to relational databases like Teradata, Oracle and MySql.
Future Prospects
They have been growing at a rapid pace, roughly doubling sales every quarter for the past 4 quarters. Additionally, Waterline Data have leveraged their average deal size by 2x over the past year. So not only is the volume of business going up, the value of each deal is also increasing. They are expected to continue at a rapid rate of growth as they expand to reach out to organizing, even more, types of data across more data sources and extend their solutions to have even more end to end value for the target problem spaces, as they are focused on including data self-service, data governance, and data rationalization.