An entire Herculaneum scroll has been read for the first time
Preprint: https://scrollprize.org/pdf/main.pdfhttps://github.com/ScrollPrize/villa
737 points by verditelabs - 166 commentsPreprint: https://scrollprize.org/pdf/main.pdfhttps://github.com/ScrollPrize/villa
737 points by verditelabs - 166 comments
Do we have better imaginations? Can our sci-fi writers come up with something equivalent that is as dizzyingly far from what we know now, as now is from what Aristocreon knew?
Hats off!
I feel the opposite of that feeling and am immensely proud of everything that the core challenge team has accomplished
Certainly my Mark 1 eyeballs would not obviously perform better than random guessing at this task. Although my eyeballs are, if nothing else, nerfed by only being able to see a 2D slice of the data.
A lot of labeled data is available on our ftp server which has public access
Could it be automated to the point where it's faster to scan a book closed than opened?
where else do you think these techniques be applied?
A stable base corpus and some dynamic programming will allow you to clean up the remainder[0].
[0]: http://stackoverflow.com/a/11642687/2449774
Though I have an interest in Old Norse and I spend a lot of time reading Scandinavian runestones. > 90% of them are grave markers for a dead father, mother, brother, sister, cousin, etc. If I've learned anything from that, it's that people across time and space all lead lives as real and complex as anyone else's. Their joys were as high as mine have been and their sorrows as low as mine have been.
Once you have some unwrapped papyrus, you can render it to an image and look for ink. Ink leaves a certain texture that can be identified by the naked eye and labeled. Between these two processes you get the segmentation and ink detection ground truth. Segments can be flattened virtually through existing software and algorithms.
I can see why you'd be attracted to this project from a "let's solve problems computationally" perspective (never mind the historical side). It sounds like there are some cool problems in there.
The eye toward automating the process that the project seems to be targeting is particularly cool, too. This kind of stuff that makes me have real enthusiasm for ML.
Ah, the good old bitter lesson strikes again
To give numbers, for ideal portions of scrolls, we can read 100% of the characters. In nonideal portions of scrolls, we can read 0% of the characters. It's not really possible to quantify how much we could theoretically recover of that 0% through better methods, and how much is truly destroyed.
https://en.wikipedia.org/wiki/Nigel_Richards
Congratulations, and thank-you!
https://en.wikipedia.org/wiki/List_of_lost_literary_works
I was under the impression that there was almost nothing left of that school of thought, and that it’s writings had been destroyed.
What would you like to have instead?
The Epicureans and Stoics did not care much about Christians and Jews, but after the Christians obtained the power in the Roman Empire they made great efforts to persecute and discredit the Epicureans and the Stoics, as the most dangerous kinds of non-believers. (Unlike the rational Epicureans and Stoics, the traditional polytheists could be much easier converted to Christianity, by inventing a set of Christian saints to which the former polytheists could redirect the prayers and the holidays to which they were habituated.)
The Christian propaganda has created a false image of the Epicureans, which has persisted until today.
The Epicureans were not atheists, but they had a very different conception about what Gods are. They thought that in nature there are a lot of entities that have a god-like power, i.e. humans are too small and weak to influence them in any way, but the life of the humans is strongly dependent on the actions of those entities, so they can rightly be considered as gods. Examples of such entities are the Sun, the Moon, storms, volcanos etc.
Unlike in the traditional Greek and Roman religions, where it was believed that for each such natural phenomenon there exists some sentient god, who can be convinced to change the events to a more favorable outcome by prayers and sacrifices, the Epicureans believed that the gods, even supposing that they were sentient, in any case they do not care about humans more than humans care about ants, so there is absolutely no point in praying to them or bringing sacrifices to them.
Therefore humans should conduct their life according to ethic principles, but without worrying about what gods may think about their actions.
Many modern humans would probably agree with the Epicurean philosophy, which was completely different from what the Christian propaganda claimed, e.g. that Epicureans were some kind of sinners addicted to pleasures.
History! That's what intrigues me the most: texts with accounts of events that have otherwise vanished from the historical record.
have any attempts (or just ideas) been made to recreate such charring on known texts?
The team did "the campfire scroll" experiment a few years ago to replicate carbonization, unrolling, and ink detection. That is the only case I am aware of. It proved the method could work but it's not a source of say training data; it varies too much from the real scrolls.
The main limitation is time and cost. We have to scan on what is AFAIK the most powerful x-ray beam line in the world. It is not cheap
In this case the time on the equipment would need to be included, both a portion of the cost of building/maintaining it, and probably the energy needed to run it. Even where the government is providing the grant (likely here), it still needs to be accounted for.
For iron gall ink with high enough iron concentration, the ink stands out in the xray volume through simply masking off low values, such as was shown in our campfire scroll experiment a few years ago. No herculaneum scrolls show similar ink.
I am, though, not a papyrologist, so historical ink making, preparation, and usage are not my field.
It's easy to just read about the breakthrough and see it as one neat, linear line to get there, and hard to comprehend the hours, months and years that so many spent to get there. Big congrats to you, Sean, Nat and the entire team!
Major kudos to all of you on your achievements! This is amazing work for anthropology and for society, and it's greatly appreciated.
You have a potential to rewrite the history of European Antiquity quite substantially. The Herculaneum set of scrolls is enormous and must contain a lot of hitherto unknown.
That comes with a set of peculiar risks. Once your work starts producing something that contradicts previous work of Very Important People, they will lobby to stop you. Be prepared for that.
Science should be neutral and always value new evidence. Scientists as humans are unfortunately subject to all sorts of passions.
We have very little written material surviving from Rome, at least from the period before a codex (book) was invented, which was more durable that a scroll. Often, we only know of one source describing important events, and when it comes to political struggles and civil wars, the perspective of the defeated party often did not survive. The punishment of damnatio memoriae was practised and even among the early emperors, Caligula and Nero were subject to a form thereof. (This library in Herculaneum was buried 11 years after Nero's death.) I would be surprised if everything in the scrolls perfectly aligned with the record that survived for 2000 years and that was filtered by both random chance and political/religious censorship. Even Christians later destroyed some pagan texts.
BTW personally, I would love for some textbook of Etruscan to emerge from there. This was once again a language whose teaching was banned in Rome.
So imagine how cool it would be to find a full library with thousand of scrolls across many different topics, that can now be read with this technology.
It's also well known that surviving texts survived because they were copied again and again on costly animal skin during the Middle Ages, by monks who had to make a choice and naturally favored topics that were of most interest to them.
This could quite literally change everything.
[0] https://talesoftimesforgotten.com/2021/09/25/are-there-more-...
Heterodoxy (or really, orthodoxy) wasn't really a thing in 79ad, and you're not likely to find much of it in the private library of a wealthy Roman's vacation home. The only forbidden work you're going to see from that era is stuff critical of the emperor.
Or of technology- steam power, mechanical computation (like the Antikythera mechanism, which is the only known example of such a thing until 1300 years later), mechanized production, mining techniques, etc
The exception though would be Greek literature. Greek literacy collapsed in the early medieval era and a large catalogue was probably just scrapped or discarded before even being collected in Monasteries. Herculaneum could represent a legitimate treasure trove in that regard.
So we can just get ChatGPT to fill in the blanks.
There are lots of very smart folks working on incredible things, they just aren't as loud.
A Post-Great Solar Flare of 2484 Step Brothers DVD Has Been Decoded
I'm kind of obsessed with the ancient world. I dream of being able to read entire pages of new text from ~2,000 years ago.
How much of the translators bias makes these seem like academic papers instead of social media posts.
For anyone who wants to read ancient texts, there are bilingual editions, for example those of the "Loeb library".
The translations that omit the original text are just for the people who want to have some idea about the content, but do not care about the correctness of the translation.
With a bilingual edition, it is easy to understand the original text even with relatively little knowledge about the original language.
The original text is important because frequently the translator is forced to introduce inaccuracies in the translation, because of the absence of exact equivalents in the target language, which would require a long explanation of the original meaning, instead of just a translated sentence.
Especially misleading are translations where several distinct ancient words are translated using the same English word, so some nuances are lost.
Equally confusing are the cases when the translator chooses to translate the same ancient word by different English words, because even if the meaning of a word may depend on the context, many translators fail to judge correctly the context, because they may lack specialized knowledge so their guesses are not necessarily better than of the readers who may be less competent in linguistics, but more competent in the science or technology needed to understand the context. Better translators prefer to use a one-to-one mapping between words, which makes it easier for the readers to discover the meaning intended by the ancient writer, after seeing multiple examples of usage.
To think that there is some sort of absolute truth of how something ought to be translated is IMHO just not reality. Especially when it comes to texts that not only were oral literature long before being written down but we of course have no copies of the originals (whatever original means in this context), but only transcriptions of transcriptions of...
Take Beowulf for one. While perhaps Shippeys translation is very much faithful to the copy we have, is it "better" (whatever that means...) than Tolkiens? or Heaneys? Could we say what the poet would have liked more had they sat here in 2026 and read them all? Of course not and having a multitude of different translations is what we need to fully enjoy these texts (since not all will be able to learn the different ancient greek dialects, latin, old english, sumerian, etc., etc. I'm saying this as someone who is now studying ancient greek).
Casual letters and graffiti would be closer to tweets.
While I step through the valley of the shadow of death,
I contemplate my life and perceive that nothing remains.
For I have hurled weapons and laughed for so long that
Even to my mother, my mind appears to have departed.
Yet I have deceived no one except him who was worthy of it;
For me to be held as a coward—that indeed is unheard of.
Beware what you speak and where you set out,
Lest you and your companions be outlined in chalk.
Latin is also a very rich language and this is no snippet.
Translation is always hard, especially from a couple thousand years ago BUT this kind of translation comes with a lot of confidence.
* ἐκ- = “out,” “thoroughly,” “to the end”
* πονέω = “to labor,” “to toil,” “to work hard”
> * ἐκ- = “out,” “thoroughly,” “to the end”
ἐκ is more motion away from something. It's often an intensifier in verb compounds but not really as a standalone preposition.
Ancient Greek is a very different language from English. I've found people who try to brute force it by looking up individual words without a knowledge of the grammar end up with a worse understanding of a text that someone who just reads in translation.
- Philodemus, On Gods, Book 8 Year 0. Ish. :}
You mean ropes and carts?
How does ASML make the most modern chips? You mean light and mirrors?
Any master stoneworker from any era should be able to carve stone to that level of precision given enough time and reason. The problem, as always, is that there is usually very little reason to put in that amount of time and effort when you can get 90% as good for 50% the effort.
My dentist is pretty good at doing this too, by putting marking paper between my teeth and having me bite down. I wonder if a similar technique could be used:
Have the blocks close together, constrained to only move on a single axis by rails or whatever. Drape a thin sheet of material over one of the blocks, the non-moving one (perhaps it's an already-placed one?) Maybe it's something that visibly shows when it's crushed, or maybe it's coated with the blood of the powerless. Smash the other block into it. Pull them apart and look where they made contact. If it's mostly everywhere, done. If not, grind down or chip out the parts that touched. Repeat until you run out of innocents.
To do the very last block, you'd have to meld two sides, remove a block, fix up the other side, and then put it back in. Which might make this testable.
But I'm just pulling stuff out of my nether orifice.
But really impressive stuff! Between this and (a particularly optimistic outlook on) the Linear-A news from the other week this is an exciting time for linguistics.
A thought: I guess the days of scratch off lottery tickets are numbered?
Apparently they did CT scans of closed books and read the content. Polevoy, Dmitry V., et al. "From tomographic reconstruction to automatic text recognition: the next frontier task for the artificial intelligence." Fifteenth International Conference on Machine Vision (ICMV 2022). Vol. 12701. SPIE, 2023. https://iris.unive.it/bitstream/10278/3687069/1/Albertin_et-...
So yeah, but lottery companies probably make it harder by engineering against it.
Beautifully ironic, that we find this message.
I love stuff like this because it gives a glimpse into Roman society. To me it seems like they were very similar to us today, forever contemplating learning, existence, gods.
Emphasis mine.
Amazing!
(Btw, you can use the 'edit' link to fix things like this if the software gets a title change wrong.)
They are in a variety of conditions - some of them people were able to "break" open and read. But the vast majority of what remains is too delicate and brittle to risk.
Fantastic work!