Feature Articles

Seagate Lyve Cloud: meeting both customer and government concerns

By Ken Wong - 1 Nov 2022

Seagate Lyve Cloud: meeting both customer and government concerns

Seagate launched its online Cloud Storage Platform, Lyve, in 2021 to take advantage of its own expertise in the space. They followed this with the launch of Lyve Analytics this year.

Again, Lyve Analytics came about from Seagate’s own experiences in making sense of the sheer amount of data they had collected internally. Every day, Seagate's seven large-scale AI-enabled manufacturing sites around the world generated over 50TB of image and parametric data, which was added to its 15PB data lake from which Seagate was able to glean the knowledge to save time during its own manufacturing processes. 

Globally, customers include the Alfa Romeo F1 team ORLEN, Crowdstrike, and video-conferencing solutions provider Zoom.

But like any cloud storage platform, concerns remain.

Questions on privacy, data sovereignty, and security remain. So how does Seagate handle this? More importantly, how do they help their customers handle it?

We spoke to Seagate’s Chief Information Officer, Ravi Naik to find out they help customers achieve their goals while complying with government standards and regulations across multiple markets.

Ravi Naik is Seagate's CIO. Image source: Seagate.

One of the things that struck me was the different Lyve Cloud locations that you have around the world. One question that always comes to my mind, is data sovereignty still an issue? Should we still worry about it? Is it, and why is it still a concern with companies, customers, and their own internal users?

Data sovereignty, in my opinion, is still a topic. I wouldn't say “concern”, but it's a topic of great interest for businesses. And I think it is still very early days in terms of what the rules are going to be, what rules are going to be enabled and enforced.

You're starting to see some of that happen in certain countries, but there is no uniformity. And I believe that will continue, for some period of time. And for businesses to operate in this environment of continuous flux, it’s going to be quite challenging, especially if you are taking a one size fits all approach.

And that's something to keep in mind.

So, when you design solutions and services, keeping data sovereignty in mind, going into the architecture of your platforms is going to be critical. And that's where Lyve cloud actually does have an advantage. Given the fact that we are a little late to the game, we can address what the market needs as we roll solutions out.

So what we have done is essentially, is that we're saying we're not a hyperscaler, we are a cloud at the Metro Edge. Our basic building block is a five-and-a-half petabyte footprint. So, we can move very rapidly. At the same time, we can be in multiple locations, in geographically distributed locations. We don't have to be in let's say, Germany, to serve all of Europe, we can be in Germany, in Switzerland, in Austria, Austria, in every location, and that way, we can actually cater to the local requirements when it comes to data sovereignty. That's the philosophy behind how we have architected Lyve Cloud so that we can have a smaller footprint but larger locations.


We have standards like ITIL, PDPA in Singapore, GDPA in Europe, etc. So, when customers come to you and ask for help deciding which to follow, and how do you advise them? I mean, and how do you work on design requirements that will meet their own internal needs, but still comply with government legislation?

So, there are a few things that I will touch upon this. Number one is certifications. If we get into this chase of certifications, it's a never-ending chase. Because there is, there are globally accepted certifications and that there are local and regional ones.

Now if you look at the United States, California itself is coming up with its own certification, which is like GDPR. So, it's an evolving kind of environment. And what we look at is really, what are the broad-based certifications needed?

So, SOC 2, ISO 27001, HIPAA, FedRAMP, GDPR. These are broad-based certifications. And then the approach is to actually go out and say: Okay, if you want localisation, then we have the ability to say if you have a tenant, let's say, in London, and you do not want from a regulation, regulatory perspective, you want to put a hard stop and moving that data out of London into a different location, you have the ability to actually take it off the grid and say: These are specific localised environments. That way they don't show up as a possible target in other regions.

Today, what you do is if you go to the hyperscalers and say: Hey, I assigned one contract, one account, swipe a credit card, and you get the whole world in front of you. You see every single region on the planet. And that essentially means you can start moving data back and forth, not thinking about what regulations you’re not following, right?

Versus if you go to a localised model. If you have it in London, we don't open it up to the world. So that's really the approach you take to slice it and dice it so that we actually cater to each of the local requirements, but have mobile certifications in place.


You mentioned hyperscale, data centres and everything else. How do you see storage requirements evolving in the wake of hyperscale and hyperconverged infrastructure?

Right, so when I look at hyperscalers, the primary target for hyperscalers is the whole growing datasphere. Right? There's data growing, there's workloads growing, the entire world's becoming more technology-driven, and there is no industry vertical, that is no longer a technology vertical. So the growth of data is going to drive the growth of storage. And a large amount of mass-capacity storage is required in the world, and it continues to grow, right? So from that point of view, Seagate is very well positioned.

When it comes to hyperconverged, hyperconverged storage is primarily to target specific workloads where you want compute and storage to be in close proximity. You want zero, close to zero latency between them. These are very specific workloads.

And I believe there is a place for hyperconverged storage, and also for storage, like what we provide with Lyve cloud, which is primarily for unstructured data, right? Unstructured data is the fastest growing data in the world, given the fact that there are videos and pictures and a lot of 4K solutions. In semiconductor manufacturing, like in our case, we actually do a lot of image analysis. So unstructured data is the fastest growing, so solutions like s3 storage, s3 compliant storage and Azure Blob, are going to be the primary targets for data. Of course, there is going to always be room for hyperconverged storage, there's going to be room for flash-based- all-flash arrays, these are very specific workloads.

Seagate used its own experiences in manufacturing and storage in creating Lyve. Image source: Seagate.

With regards to Lyve cloud and unstructured data, how do you help customers who come to you and say, “I've got all this unstructured data and I've spent on big data solutions already, but they haven't really made, helped me make sense of what I have yet. Can you help me with this problem? Can you help me move on to beyond making sense of it (Big Data) and into machine learning and onto AI?”

That's a great question. And I say that's a great question because we've lived that life, right? The biggest challenge is to go beyond the hype. We are in a bit of a hype cycle when it comes to machine learning and AI and analytics, there isn't, there isn't any new VC pitch that you can make without having the term AI in it, right? Leave it out and you won't get any funding.

But once you go beyond the hype, beyond the buzz; the challenges, really, there's so many different offerings and tools and products, and it's very easy to go spend some money and buy a tool or buy a suite of tools. But then that's where it stops, how do you take advantage of it? How do you extract value from it? That's where the biggest challenge is.

And it starts actually, with understanding what you're trying to solve. It's very important to have a clear, crisp understanding of the problem you're trying to solve, and of the outcome you're trying to get.

Once you have a very clear understanding of that, then you actually look at - okay, what is the data that you need to collect? Versus when in the early days, we used the term “data lake” quite a bit. And then it became a data lake, a data swamp. What is it exactly, right? And a data lake becomes a data swamp because all you do is collect data, and collect all kinds of data, and then you don't know what to do with it, right? It just sits there.

So it's really important to understand the use case, and the problem statement, and then go collect the data that is actually going to help you solve that. Be very clear about which data you need to ingest in your data lake.

Identifying the problem, and collecting the right data, then comes a critical part of organising that data. You need to organise the data structure so that it is actually in a format that is readable, and understandable by ML models. Once you have that, you train the models. The model training, ensuring that the model upkeep is happening because the models deviate right, over a period of time they start actually deviating.

So really maintaining that whole ecosystem, right, starting from the problem to actually getting the ML models to work is a very complex and elaborate process. Having the right skills is important. The right tool is a very small part of it, it is the skills, the experience, and the knowledge. And this comes from our own learning.

We actually use a significant amount of machine learning capabilities in our manufacturing facilities in our factory here in Singapore and all over the world. Because we have this high-speed manufacturing, it is important for us to do image analytics, for quality, for yield management.

And from our learnings, what we understood is, it's the implementation which is the most critical aspect of it. And what we have decided to do is provide this implementation service packaged along with our product. That way, we don't have to nickel and dime our customers for every little tier of solution.

We bring the storage, the compute, the data engineering, and the labs. And along with that, we bring a group of data scientists and data engineers who will sit and understand your business problem, formulate your data strategy and be able to help you in your journey to get outcomes.


You mentioned data lakes and data swamps. And they're not new - data lakes, as you said, they've been around for a long time. And, you know, we've all seen storage go from silos, to pools of resource. From a single pane of glass where we can see all our data at a glance to consolidation to virtualisation, and working with containerisation. What is different this time around? I mean, it seems like we're going around in cycles again, you know, cloud computing used to be utility computing. Now it's back again, in different forms and different variants. Data lakes, this time around, what's different with them?

So data lakes are not the Holy Grail that they used to be. That's how I look at it. Data lakes, were at a point in time, they were the be-all and end-all of everything right? Once you have a data lake, you achieved your objectives. A data lake is essentially a big bucket of data. Now, there's good data and bad data, and you don't know the differences.

What has changed this time around? Number one is that we've learned, right? The industry has learned that collecting data is not the answer. It is collecting the right data. And getting intelligence out of that data is really the key. So, there has been an evolution in thinking there's been more experience and what works versus what doesn't work. And the data lake is not the outcome. The data lake is just the beginning of the repository, it is really our ML models and the outcomes that they generate, which is really the focus.

Seagate's R&D in Singapore is done here. Image source: Seagate.

So as we look at IoT, and more devices at the edge, and we're pushing analytics out to the edge, how do you see, Lyve cloud going, moving to that edge, given that your you know, your Metro Edge is moving out to that.

IoT is designed to be at the edge. The idea behind Lyve cloud is to be at the Metro edge because we want to be much closer to the workloads. There are mission-critical workloads, which are latency sensitive.

They may struggle to actually run their analytics in the public cloud, in a hyperscale cloud simply because of latency. For example, if you're in a hospital where vital healthcare analytics is going on, you want to be very close to where the workload is, versus having all the data move into a centralised repository, run the analytics, and get the results back. Especially if it is real-time data.

That's where the Cloud or the Metro Edge makes more sense for mission-critical latency, and sensitive data, like manufacturing, healthcare, and so on. This is an area where Seagate is. We at Seagate believe that the greatest value is for customers to run analytics. These are real-time analytics for manufacturing, hospitals, and so on and so forth.


Who do you partner with for your analytics at the edge? I've been looking at solutions for analytics at the edge-driven by SAP, driven by NVIDIA, and they each you know, there's a CPU driven, or GPU driven, or NPU driven. I've always, I'd be curious as to know, do you see a difference between them and is there I guess, do you see one becoming more dominant over the others?

There are two elements here. One is the infrastructure component. And then there is the, what I call, the data engineering and ML Ops layer.

What you're referring to is essentially the infrastructure, whether it be storage that has HDDs, or flash or DRAM or a combination of it all like we have. We have three stacks of storage offerings in Lyve Cloud. And the compute can be CPUs, or GPUs, again, depending upon the workload, the kind of analytics, you're running. Certain analytics need GPUs, other analytics can run on CPUs.

We now have all that understanding. Five years ago, there was no clear understanding of that. And it was more like, hey, I want GPUs everybody wanted GPUs, and then a bunch of GPUs sitting around, spinning, and not being utilised. But it's much clearer, right? From a workload perspective.

The differentiator is really in the application stack. The application stack is really where the value is, right? That's where you are able to take this data that you collected, and organise it, the data engineering is the plumbing, that's how I look at it, right? It is really the plumbing of getting all this data, organising it in the right format, making sure it's always available, and normalising it so that you don't have to always read petabytes of data. You take these petabytes of data and normalise it into 200TB. Now your engine can run more effectively versus trying to ingest a petabyte of data every time. So that's what the data engineering pipes do. Leveraging the CPUs, and GPUs, and then building the ML models, right? So the way we look at it is our differentiator is not just in the technology stack, not just bringing Lyve Cloud storage and the compute layer, but also the ML ops and data engineering layer customers.


Do you see deep learning, forming part of the next evolution of machine learning or, do you see that progressing onto AI as well?

I have said this a few times before, and some people don't like it. AI is nothing but the same algorithms that were running 20 years ago. But it's just running a lot more data. That's really what it is.

When you actually have a lot more data, now you start seeing patterns. When you have a limited amount of data, you don't have the ability to generate patterns. When you have a lot of data, over the last decade let's say, now you can actually get climate data and look at the patterns based on human activity. What does it look like for the last 50 years of data? The more data you collect, the more intelligence you can gain because you now have the ability to do pattern recognition. So deep learning is exactly that. The more data that you collect, the more data that you have accessible, it’s going to give you the ability to do a lot of deep learning, and be able to really generate that intelligence that is required to make a fundamental change in the world.

Image source: Seagate.

So are your customer requirements, in Singapore, for example, are they any different from those In Europe or America?

Storage is ubiquitous. Though we talk about the different workloads we have, backup and archive and content repository and data analytics, storage is a horizontal offering. It doesn't matter what you store, it all sits within the object storage that we build. So, from an offering perspective, we offer the same platform and the same suite of features to our customers.


And finally, what can we expect from Lyve Cloud in the next three years?

We have a strategy to continue to scale. We are scaling very aggressively, both horizontally and vertically. Vertically in terms of product features, capabilities, and additional services, that we bring to our customers.

We've been around just for under two years now. And in these two years, if you look at our product today, versus when we launched it is a very different product and a very different service.

I like to say our product is our service, right? The service is really the product. And I see it continue to evolve. We started with the storage service. We have Lyve Mobile, which is our data migration solution that we have. We can move petabytes of data physically. Now think about it, moving 10 petabytes of data, from one part of the country to another part of the country. It takes months and costs tens of thousands of dollars. We have a device, which is a Lyve Mobile device, which is a one-petabyte to five-petabyte device with wheels on it. We actually have a forklift that comes in, picks it up, and puts it in the UPS truck. And you can move five petabytes of data from Singapore to the US in 24 hours right, at a fraction of the cost. So that's a new service that we offer.

We have this tape-to-cloud migration service, a lot of data sits on tapes. We actually bring that in, digitise it, and migrate it to the cloud. So, we'll continue to evolve our services, and have more offerings, analytics is our latest, and we'll continue to grow that.

And then horizontally, it is about really growing geographically, expanding beyond the US and Singapore or other parts of Asia. We have plans actually, to grow in APAC significantly over the next 12 to 18 months time. So yeah, I see it as being a multi-exabyte storage cloud in the next three year’s time.


Join HWZ's Telegram channel here and catch all the latest tech news!
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.