Hacker News

Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

112 points by Bogdanp - 82 comments

jawns [3 hidden]5 mins ago

For all the benefits, there is a large problem with this approach that often goes unacknowledged. It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.

It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"

Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.

jfengel [3 hidden]5 mins ago

I tried, for some time, to develop a product designed to solve this. It would have made it easier to specialize models locally while complying with the corporate one. (Basically, beefing up the data definition language to something like prolog, and putting real thought into making the corporate model reality-based rather than just what suits your current requirements.)

Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.

But am I bitter about it? No, why do you ask? Twitch, twitch.

bertails [3 hidden]5 mins ago

UDA embraces the duplication of models: it's a fact of life in the enterprise. That is why "domains" are first-class citizen. We believe that good discovery capabilities will increase reusability of the domain models. Our next article will dive more into the extensibility capabilities of the metamodel Upper.

citizenpaul [3 hidden]5 mins ago

IME it often comes down to "big men" issues where someone important wants the data in a certain way that is not logical or consistent so they won't let the "tech people" simple take the data and present it in a way that is logically consistent and follows best practices. They want to sit in meetings and create their own mental model monstrosity and force the devs to make it. Once that happens one time there is zero chance of the company ever having a consistent data model at any point in the future ever.

Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.

bertails [3 hidden]5 mins ago

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.

> But think about the red tape that introduces.

We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.

But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.

(I'm one of UDA's architects.)

datadrivenangel [3 hidden]5 mins ago

How can it be universal if everything isn't in UDA?

cush [3 hidden]5 mins ago

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.

wjnc [3 hidden]5 mins ago

Data drift is real! I’ve recently restored sanity in a medium sized enterprise where there were three concurrent financial data flows. Including people not understanding each other, projects to find out ground truth and triple the workload in maintaining the dataflows. I’ve quipped to the team that endless summer is near. What if we only work on business relevant development. I would dream that the bigcorp we are part of would do the same. They are more of a tack on another Excel based solution kind of firm.

datadrivenangel [3 hidden]5 mins ago

Data drift is real, and the yoke of governance chafes enough that new people insist on redoing your work in excel until the problem gets bad enough that a new data governance push is needed.

Spooky23 [3 hidden]5 mins ago

The alternative is the same barriers, except with a parallel phone a friend governance model when you have to share data between verticals or programs.

It’s a classic pattern in public sector applications, where it’s partially deliberate.

tomrod [3 hidden]5 mins ago

Corolloray to Hyrum's Law then. Perhaps we call it "Orange is the New Model" Law

mkoubaa [3 hidden]5 mins ago

Love it

stathibus [3 hidden]5 mins ago

At a place like Netflix where the product has been fundamentally the same for almost a decade, installing this kind of red tape is great for job security

echelon [3 hidden]5 mins ago

> installing this kind of red tape is great for job security

It really doesn't, and that's not the point. This is for business entities that are larger than teams.

It's way worse to have a million different schemas with no way to share information. And then you have people everywhere banging on your door asking for your representation, you have to help them, you have to update it in their systems. God forbid you've got to migrate things...

If your entity type happens to be one that is core to the business, it's almost a neverending struggle. And when you find different teams took your definition and twisted it, when you're supposed to be the source of truth, and teams downstream of them consume it in the bastardized way...

This project sounds like a dream. I hope it goes well for Netflix and that they can evangelize it more.

rco8786 [3 hidden]5 mins ago

This doesn't sound significantly different than any other large tech org.

If your data/service/api is used by a lot of other people in the org, you have to work with them to make sure your change doesn't break them. That's true regardless of the architecture.

dboreham [3 hidden]5 mins ago

Reminds me of my experience trying to understand what SAP actually is. For decades I wondered what sort of magic tech must be in there that allowed their software to be used by thousands of different businesses. Then someone who knew about SAP told me: "oh, no that's not how it works -- what they do is have a fixed schema and tell the customer that they must adopt it".

UltraSane [3 hidden]5 mins ago

Epic EMR is the same. But then some hospitals insist on customizing it which causes no end of problems.

giantg2 [3 hidden]5 mins ago

You could store the info as a common definition and then just use transformations on retrieval or storing if there's an exception for that system/business group.

thefourthchime [3 hidden]5 mins ago

sometimes grug go too early and get abstractions wrong, so grug bias towards waiting

big brain developers often not like this at all and invent many abstractions start of project

grug tempted to reach for club and yell "big brain no maintain code! big brain move on next architecture committee leave code for grug deal with!"

but grug learn control passions, major difference between grug and animal

instead grug try to limit damage of big brain developer early in project by giving them thing like UML diagram (not hurt code, probably throw away anyway) or by demanding working demo tomorrow

working demo especially good trick: force big brain make something to actually work to talk about and code to look at that do thing, will help big brain see reality on ground more quickly

remember! big brain have big brain! need only be harness for good and not in service of spirit complexity demon on accident, many times seen

https://grugbrain.dev/#grug-on-complexity

Multicomp [3 hidden]5 mins ago

It's been so long since the Semantic web and RDF and OWL and SKOS. I'm so glad they stuck with W3C and didn't reinvent those wheels. Will this UDA approach catch on? I don't know, but I hope so. It seems like it is trying to move the frontier of the difficulties of applying Domain Driven Design and semantic concepts to an enterprise company of significant scale.

If we can get compound interest across development teams by giving them a common toolset and skillset that covers different applications but the same data semantics, maybe not every data contract will have to be reduced to DTOs that can be POSTed or otherwise forced to be a least common denominator just so it can fit past a network or other IPC barrier.

For that, I'm grateful Netflix is working on this and publicizing the interesting work.

majormajor [3 hidden]5 mins ago

I'm curious if anyone has seen business improvements along the lines of "this let us discover something that led to 5%+ or >$5M improvements" (percent or absolute depending on how big the company is) from these kinds of efforts?

I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)

I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?

And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.

RobinL [3 hidden]5 mins ago

> "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on"

This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.

But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?

frankdejonge [3 hidden]5 mins ago

A bit unfortunate they used the term domain model here. Domain models here are purely data-centric, whereas domain modeling focuses mainly on behavior, not underlying data structures. The data that is used in domain models is used to facilitate the behavior, but the behavior it the code focus.

From a modeling perspective, there is certainly inherent complexity in representing data from domain models in different ways. One can argue though that this is a feature and not a big. Not the same level of nuance and complexity is needed in all of the use-cases. And representational models usually are optimized for particular read scenarios, this seems to mandate argue against that, favoring uniformity over contextual handling of information. It will most likely scale better in places where the level of understanding needed from the domain model is quite uniform, though I have seen most often that use-cases are often complicated when they do not simplify concepts that in their code domain model is very complex and nuanced.

cpard [3 hidden]5 mins ago

Reminds me of the work done at Uber with Dragon

https://www.uber.com/blog/dragon-schema-integration-at-uber-...

Unfortunately it never got open sourced but Joshua left for LinkedIn and started working on the LambdaGraph project and the Hydra language that are open sourced.

You can find more information on this fascinating work here:

https://github.com/CategoricalData/hydra

I think these approaches, including all the semantic web stuff from 10+ years ago, suffered from the added overhead of agreeing and formalising semantics and then of course maintaining them.

I wonder if LLMs can help with that part today.

enjoylife [3 hidden]5 mins ago

> Once concepts are selected, Sphere walks the knowledge graph and generates SQL queries to retrieve data from the warehouse, no manual joins or technical mediation required.

If I had to guess this is how eng pitched it to the business to carve out the time to build this tooling. As with all these internally built schemas, ui’s, tooling, etc… they’re never gonna post how much this is actually used relative to the work arounds ds and eng use in their day to day.

bertails [3 hidden]5 mins ago

The price is in the 500+ domain graph services federated into our GraphQL enterprise gateway, which will all be exposed to Sphere through UDA. That's real.

bravesoul2 [3 hidden]5 mins ago

Why would Netflix engineering host on Medium? Very odd. And you just lose readers to the popups but you don't benefit from their discovery much either.

mdaniel [3 hidden]5 mins ago

Every time I see that hex-encoded URL, I enjoy plugging scribe.rip <https://news.ycombinator.com/item?id=28838053>

https://scribe.rip/uda-unified-data-architecture-6a6aee261d8...

yyhhsj0521 [3 hidden]5 mins ago

So they don’t have to maintain it themselves

echelon [3 hidden]5 mins ago

> you don't benefit from their discovery

Sure you do.

And the types of engineers writing on Medium are the ones they want to recruit, too.

bertylicious [3 hidden]5 mins ago

How does this relate to domain-driven design? It seems to be at odds with it, because in DDD it's kind of expected that the same concept will be represented in a different way by each system? But to be honest, I didn't read the whole blog post because of the UML vibes.

bertails [3 hidden]5 mins ago

> How does this relate to domain-driven design?

The "Domain" in `upper:DomainModel` is the same D as in DDD (Domain-Driven Design) as the D in DGS (Domain Graph Service).

> in DDD it's kind of expected that the same concept will be represented in a different way by each system

In UDA, those concepts would explicitly co-exist in different domains. "Being the same" becomes a subjective thing.

regularfry [3 hidden]5 mins ago

It doesn't. It's a blessing that they avoided the term "ubiquitous language" because that's almost exactly the dual of this concept, although people who have only ever heard the words and not dug any deeper won't know what the difference is.

borromakot [3 hidden]5 mins ago

https://ash-hq.org

> Model your domain, derive the rest

Been doing this for 5+ years.

bertails [3 hidden]5 mins ago

This does look interesting. Does the Ash Framework yield a knowledge graph? How good is it a cataloging existing data containers?

heisenbit [3 hidden]5 mins ago

I really believe a common vocabulary makes sense. But it is hard, very hard as you spread across organization (some to be bought and integrated), business processes and time. As soon as it comes to generating stuff things become hard. One may be able to generate interfaces between two systems but which enterprise has only two layers? Yes, if all knowledge is captured in the central catalog we may be able to do it but who builds this perfect database and maintains it?

Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.

chiph [3 hidden]5 mins ago

The problem I've seen is that you define your corporate entities, but then you have these systems in other divisions which need to extend it. Whether their division's special attributes get promoted to the corporate entity for everyone to use brings in politics and optimism. And making an update to a corporate-scoped entity then means you need solid change management.

IMO they can be very valuable in terms of reduced friction and costs, if you do it right and have enough rigor/discipline in the organization. Netflix might.

smarx007 [3 hidden]5 mins ago

> Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.

Wikidata? 1.65 billion graph nodes and counting under a common vocabulary.

Keyframe [3 hidden]5 mins ago

Having dealt with same problems for years now (we call our UDM - Unified Data Model, heh), I was under the impression this was an over-engineered Datamart++; It's not though. Calling UDA a datamart would be like calling K8S a bash script, which might be related but wildly different in scope.

I am definitely interested to read more and implement it myself as well. Would also be more than happy to skip the whole GraphQL end of it.

bertails [3 hidden]5 mins ago

> Would also be more than happy to skip the whole GraphQL end of it.

Netflix benefits from a large GraphQL ecosystem with federation, which is why it's so central in UDA from day 1. But adding a projection to "REST" would be very easy.

Keyframe [3 hidden]5 mins ago

I don't doubt their yield out of GraphQL is great. Not something I'm having a need for though. I'm at the helm of the tech group at one part of dun&bradstreet so we have different challenges, unification across different borders being primary one. We manage, but the going gets tough sometimes. Described architecture of UDA certainly seems to be what it was designed to solve. I think our system is even at a perfect inflection point to adopt at least some of the principles described to provide a clear path forward to resolve some of those challenges we face; Not as a replacement, but more of as a control plane over our system. I can already see how we could avoid at least schema bloat, lowest common denominator fields and overall rigidity.

Of course, details on "Upper", PDM, and Sphere are well - missing, but at least I have concepts to focus on :)

bertails [3 hidden]5 mins ago

> Of course, details on "Upper", PDM, and Sphere are well - missing, but at least I have concepts to focus on :)

Definitely coming soon ;-)

twodave [3 hidden]5 mins ago

I wonder how they deal with versioning or breaking changes to the model. One advantage of keeping things more segregated is that when you decide to change a model you can do it in much smaller pieces.

I guess in their world they’d add a new model for whatever they want to change and then phase out use of the old one before removing it.

bertails [3 hidden]5 mins ago

> I wonder how they deal with versioning or breaking changes to the model.

Versioning is permission to break things.

Although it is not currently implemented in UDA yet, the plan is to embrace the same model as Federated GraphQL, which has proved to work very well for us (think 500+ federated GraphQL schemas). In a nutshell, UDA will actively manage deprecation cycles, as we have the ability to track the consumers of the projected models.

oh_my_goodness [3 hidden]5 mins ago

No, unfortunately the activity is not modeling at all. It's software development. Pretending otherwise will not make our thinking (or data structures) more logically consistent.

I feel the dream. But we went to that place 25 years ago, and we saw that it was stupid.

Tell you what, I'll do a raffle. Leave a comment telling me that I just don't get it. One lucky winner will get my copy of this book https://www.amazon.com/Unified-Modeling-Language-Addison-Wes.... You pay shipping.

b0a04gl [3 hidden]5 mins ago

how much of upper is actually enforced at runtime vs just used for schema generation? like if a downstream system silently breaks a semantic assumption (say, infers enum incorrectly or drops a type constraint), does uda catch that anywhere or is this trust-based across projections?

bertails [3 hidden]5 mins ago

Great question. It really depends on the projection. For example, the projections to GraphQL and Java are mostly limited to what can be expressed there. But the projection to SHACL has access to all of SPARQL Constraints, which is what's used for the bootstrapping knowledge graph. We are looking into being able to do more runtime validation for data in the warehouse.

praveen9920 [3 hidden]5 mins ago

Main challenge with this approach is change management of models scheme. Apart from Consensus for updating schema, maintaining versioned models across services becomes a challenge. Let’s say someone deprecates a field in schema, all services needs to update the business logic based on that which is challenging and against the ethos of distributed services.

regularfry [3 hidden]5 mins ago

It's a challenge but in principle it's doable with contract testing, in the style of Pact, where there's a contract broker that disparate services all coordinate through. If you've got that, you can publish your new model version as a new contract version, and everyone can see immediately where their APIs need to change. Contracts do get a passing mention in the article, but it's not a focus.

detaro [3 hidden]5 mins ago

It's still distributed services in the service of one entity. So why is something deprecated without a clear plan what existing users will do?

To me feels related to the monorepo or not discussions?

nialse [3 hidden]5 mins ago

From the: What is ERM? We don’t need DBAs. Why use a SQL DBMS?-department.

On a more serious note, scaling of a distributed system and the associated teams necessitates handling one’s data systematically. Fixing it afterwards looks painful.

killthebuddha [3 hidden]5 mins ago

I feel like the Netflix tech blog has officially jumped the shark.

adamtaylor_13 [3 hidden]5 mins ago

I’ve never been so happy I don’t work on systems this large. Holy cow.

bob1029 [3 hidden]5 mins ago

This kind of problem could be made a lot more straightforward if we separate the schema owner (i.e., the business) from the rest of the stack. Some major SQL engines have this role built-in. Whatever you want to call it - "premature" optimization, etc. - the act of simultaneously trying to optimize while you build is perhaps important, but otherwise very disruptive to the creative exercise of naming things and relating them together (domain modeling).

When your brain is constantly locked into big-O notation and you are only worrying about N being larger than a billion, it becomes really easy to justify running a high quality representation of the domain into the dirt over arbitrary performance concerns. E.g., storing a bunch of tiny fields in one JSON blob column is going to be faster for many cases, but it totally screws up downstream use cases by making custom views of the data more expensive. The query of concern might only hit once a day, but the developers probably aren't thinking at that level of detail.

The really tragic part is that the modern RDBMS is typically capable of figuring out acceptable query plans even given the most pathetically naive domain models. I think in general there is a severe (and growing) misunderstanding regarding what something like MSSQL/Oracle/DB2 can accomplish - even in an enterprise as large as Netflix.

behnamoh [3 hidden]5 mins ago

I thought UDA meant they made CUDA but made it cross-platform :')

alganet [3 hidden]5 mins ago

> ... RDF ... SPARQL ... OWL ...

I want to believe. (really! I think that's hugely underestimated tech).

bertails [3 hidden]5 mins ago

We joke internally that Upper is like "RDF: The Good Parts".

echelon [3 hidden]5 mins ago

It's 2005 again!

These tools were pretty cool and an enormous amount of work was put into them.

The ontologies were extremely extensible. There just wasn't enough of an ecosystem putting them into practice and demonstrating their utility.

Their examples are nice:

https://github.com/Netflix-Skunkworks/uda/blob/9627a97fcd972...

alganet [3 hidden]5 mins ago

That's Turtle, it's an awesome RDF serialization. https://www.w3.org/TR/turtle/

Imagine trying to convey this example in RDF/XML (that's more like 2005).

RDFa and microdata stuff for sharing got pretty far, but those are often simpler vocabularies (at least when seen from the outside, maybe folks who index that shit has something nicer going on, idk).

Honestly, I feel kind of relieved seen Netflix using this stuff. I suggested using this kind of tech to model knowledge in systems that had this problem of knowledge representation several times, but always had a hard time when people said "if it's so good, why no big player uses it?".

cletus [3 hidden]5 mins ago

I realize scale makes everything more difficult but at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard. There are a few statements in this that gave me pause.

The core problem seems to be development in isolation. Put another way: microservices. This post hints at microservices having complete autonomy over their data storage and developing their own GraphQL models. The first is normal for microservices (but an indictment at the same time). The second is... weird.

The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie". Attributes are optional. Pull what you need. Common subsets of data can be organized in fragments. If you're not doing that, why are you using GraphQL?

So I worked at Facebook and may be a bit biased here because I encountered a couple of ex-Netflix engineers in my time who basically wanted to throw away FB's internal infrastructure and reinvent Netflix microservices.

Anyway, at FB there a Video GraphQL object. There aren't 23 or 7 or even 2.

Data storage for most things was via write-through in-memory graph database called TAO that persisted things to sharded MySQL servers. On top of this, you'd use EntQL to add a bunch of behavior to TAO like permissions, privacy policies, observers and such. And again, there was one Video entity. There were offline data pipelines that would generally process logging data (ie outside TAO).

Maybe someone more experienced with microservices can speak to this: does UDA make sense? Is it solving an actual problem? Or just a self-created problem?

jmull [3 hidden]5 mins ago

I think they are just trying to put in place the common data model that, as you point out, they need.

(So their micro services can work together usefully and efficiently -- I would guess that currently the communication burden between microservice teams is high and still is not that effective.)

> The whole point of GraphQL is to create a unified view of something

It can do that, but that's not really the point of GraphQL.. I suppose you're saying that's how it was used as FB. That's fine, IMO, but it sounds like this NF team decided to use something more abstract for the same purpose.

I can't comment on their choices without doing a bunch more analysis, but in my own experience I've found off-the-shelf data modeling formats have too much flexibility in some places (forcing you to add additional custom controls or require certain usage patterns) and not enough in others (forcing you to add custom extensions). The nice thing about your own format is you can make it able to express everything you want and nothing you don't. And have a well-defined projection to Graphql (and sqlite and oracle and protobufs and xml and/or whatever other thing you're using).

bertails [3 hidden]5 mins ago

> The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie".

GraphQL is great at federating APIs, and is a standardized API protocol. It is not a data modeling language. We actually tried really hard with GraphQL first.

twodave [3 hidden]5 mins ago

I totally agree. Especially with Fusion it’s very easy to establish core types in self-contained subgraphs and then extend those types in domain-specific subgraphs. IMO the hardest part about this approach is just namespacing all the things, because GraphQL doesn’t have any real conventions for organizing service- (or product-) specific types.

cush [3 hidden]5 mins ago

>at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard

Yeah maybe 10 years ago, but today Netflix is one of the top production companies on the planet. In the article, they even point to how this addresses their issues in content engineering

https://netflixtechblog.com/netflix-studio-engineering-overv...

https://netflixtechblog.com/globalizing-productions-with-net...

rorylaitila [3 hidden]5 mins ago

Good luck. This is not new. Back in the Enterprise OOP era, there was a fad of developing universal data entities. Everyone eventually learned that there is no such thing as a universal entity. The semantic meaning of the data model depends on the user context, not the producer context. A "Movie" is not the same thing to the Finance team, Acquisition team, Infrastructure team, or Customer. There is not even always a common identifier, let alone common fields, let alone common meaning of the fields.

Edit: The more I read this article the more I hear this voice https://www.youtube.com/watch?v=y8OnoxKotPQ

bertails [3 hidden]5 mins ago

UDA does not believe in the existence of universal data entities. We embrace the idea that 2+ teams may have different opinions on how to represent the world. We are focused on the discovery of existing entities across systems and their reusability through extensibility. We believe that automation of the projections will be key for teams to align on defining some entities, where it makes sense.

andsoitis [3 hidden]5 mins ago

> A "Movie" is not the same thing to the Finance team, Acquisition team, Infrastructure team, or Customer.

Shouldn’t it be?

buster [3 hidden]5 mins ago

No, why would the finance team care for the cover of a movie or the available subtitles? If everyone would have the same definition, changing some thing about a movie will need a change in every consumer who doesn't actually care.

rorylaitila [3 hidden]5 mins ago

No, because context and use defines the meaning. To the data team, a "Movie" might mean a file on disk. To the finance team, a "Movie" might mean a contract to a studio. To the Customer, a "Movie" is something they watch. That each of these contexts can use the term "Movie" does not actually mean they share anything in common. We could have called them "Files", "Contracts" and "Watchables" instead.

When people embark on 'universal' data definitions, conversations of the type "But is it reaaalllly a Movie??" are an endless source of confusion.

detaro [3 hidden]5 mins ago

Alternatively, the process of defining these global definitions exposes exactly this conflict and leads to common definitions of "Files", "Contracts" and "Watchables" instead of 3 conflicting definitions of "Movies"?

rorylaitila [3 hidden]5 mins ago

The conflict will definitely help define the terms. Maybe they will all choose "Movie", maybe not. Just there is no universally ideal term that represents a concept for all users for all time. It's a common error to seek such universal definitions.

mkoubaa [3 hidden]5 mins ago

A unique identifier for a movie is the same thing, like an ISBN number. What the label means in each area is going to be different. That said, some things like "director", "budget" are immutable properties of a movie but are absolutely irrelevant for the business areas and the duplication of these properties in different domains is fundamentally not that big of a deal

mkoubaa [3 hidden]5 mins ago

Wittgenstein sends his regards

smarx007 [3 hidden]5 mins ago

Below are some links for extra reading from my favorites.

High-level overview:

- https://www.w3.org/DesignIssues/LinkedData.html from TimBL

- https://www.w3.org/DesignIssues/ReadWriteLinkedData.html from TimBL

- https://www.w3.org/DesignIssues/Footprints.html from TimBL

Similar recent attempts:

- https://www.uber.com/en-SE/blog/dragon-schema-integration-at... an attempt in the similar direction at Uber

- https://www.slideshare.net/joshsh/transpilers-gone-wild-intr... continuation of the Uber Dragon effort at LinkedIn

- https://www.palantir.com/docs/foundry/ontology/overview/

Standards and specs in support of such architectures:

- http://www.lotico.com/index.php/Next_Generation_RDF_and_SPAR... (RDF is the only standard in the world for graph data that is widely used; combining graph API responses from N endpoints is a straightforward graph union vs N-way graph merge for JSON/XML/other tree-based formats). Also see https://w3id.org/jelly/jelly-jvm/ if you are looking for a binary RDF serialization.

- https://www.w3.org/TR/shacl/ (needs tooling, see above)

- https://www.odata.org/ (in theory has means to reuse definitions, does not seem to work in practice)

- https://www.w3.org/TR/ldp/ (great foundation, too few features - some specs like paging never reached Recommendation status)

- https://open-services.net/ (builds atop W3C LDP; full disclosure: I'm involved in this one)

- https://www.w3.org/ns/hydra/ (focus on describing arbitrary affordances; not related to LinkedIn Hydra in any way)

Upper models:

- https://basic-formal-ontology.org/ - the gold standard. See https://www.youtube.com/watch?v=GWkk5AfRCpM for the tutorial

- https://www.iso.org/standard/87560.html - Industrial Data Ontology. There is a lot of activity around this one, but I lean towards BFO. See https://rds.posccaesar.org/WD_IDO.pdf for the unpaywalled draft and https://www.youtube.com/watch?v=uyjnJLGa4zI&list=PLr0AcmG4Ol... for the videos

mkoubaa [3 hidden]5 mins ago

When translating from French to English, find someone that speaks both fluently and had domain expertise over the content being translated.

Don't find a linguist who understands grammatical structure and claims to be able to map the source language to some neutral intermediate structure and map that to the target language.

This is a fallacy I notice everywhere but I dont know how to name. Maybe the "Linguist translator" fallacy?

1776smithadam [3 hidden]5 mins ago

Doesn't Google achieve the same result with Protobuf?

happyweasel [3 hidden]5 mins ago

I share the same perspective .. I was also wondering how UDA handles the problem of evolving schemas, "old clients" communicating with newer server or vice versa.

tantalor [3 hidden]5 mins ago

It's more like Google Knowledge Graph

waynenilsen [3 hidden]5 mins ago

Is this soap again?

dboreham [3 hidden]5 mins ago

ebXML again.

jaakl [3 hidden]5 mins ago

It seems to be based on very common naive belief that things which are named same or similar in different domains are conceptually same, so "lets deduplicate" ? There can be rare moments when they really are, but then the moment passes and then you only have troubles.

detaro [3 hidden]5 mins ago

To me the motivation seems more along the lines of "we build lots of different systems that deal in the same domains" (because they are deep in microservice land, have apps for all kinds of platforms, ...) "lets make sure they all use the same definition of the things". Do you think that doesn't make sense (because each of those should be considered their own domain?) or does something else give you your impression?