Off to Israel for “Digital Humanities” and “Language Modeling Technologies”

Home / Uncategorized / Off to Israel for “Digital Humanities” and “Language Modeling Technologies”

technology is ruining us new yorker

 

Gregory P. Bufithis, Esq.
Chairman
The Project Counsel Group

28 May 2014 – The term Digital Humanities (DH) has been commonly used to describe the application of computational methods in the arts and humanities for 10 years, since the publication, in 2004, of the Companion to Digital Humanities. “Digital Humanities” was quickly picked up by the academic community as a catch-all, big tent name for a range of activities in computing, the arts, and culture.

As we all know, one of the easiest things we can do with computers is count things. For data to be computationally manipulated, it has to be in numeric form. For those of us who deal with text in a legal context, if we can get text into a computational form, we can easily count and manipulate the language, show trends, themes, relationships.

As I noted above, the humanities picked up on this long before legal. For example, at a DH conference I attended years ago I saw machine searchable text formats long before they overran LegalTech, performing “key words and phrase search” in a “quantitative analysis” methodology long before these terms became in vogue in legal. The iOS app Textal is just part of that trajectory.

Much of my work, though, has been in digital images, with systems reading damaged documents from Hadrian’s wall, and multispectral and 3D manipulation of damaged texts. At this year’s Israel conference Google will include a session to show how this analysis of digital images applies in various legal contexts, hence the larger-than-normal group of legal linguists and e-discovery mavens in attendance.

This trip though I am more focused on the cultural and heritage and historical applications. I like to feed the “Renaissance Man” in me and encourage thinking about computational methods in the arts and humanities, the culture and heritage, in as broad a sense as possible. The coalescing of interested scholars is natural. As the speed of computing rises, the price of computing plummets, the information available on the internet (and the possibility to create new information) increases, use and usage of internet technologies has become commonplace, in all fields.

The other part of the trip is a revisit to an area I have spend some time with post-DLD Tel Aviv which is automated language translation.

Perfect timing what with today’s announcement by Microsoft that nearly instantaneous speech translation will be available on Skype by the end of the year under plans by Microsoft to “cross the language boundary” on its popular messaging service. Satya Nadella, Microsoft’s new chief executive, unveiled a test version of the software that can translate voice calls between people at its inaugural Code Conference in Rancho Palos Verdes, California which my CTO is attending. A conversation was held on stage today to show that the service was already good enough to translate from English to German with only a short pause.

It is early days for this technology, and rival services are being developed by Google and NTT DoCoMo but Skype would offer considerably larger scale. Microsoft said that Skype has more than 300m connected users each month, and more than 2bn minutes of conversation a day. Microsoft has been experimenting with machine translation for more than a decade and has combined it with separate work on speech recognition accuracy. Microsoft’s Cortana voice assistant for Windows Phone already understands voice patterns. Microsoft has been developing a “deep neural net” that can recognize speech and learn from languages. More to come from my CTO later.

Easier, faster and more reliable translations, a world of seamless and immediate translation within our grasp. Yes, still the usual issues: context, syntax, intonation and ambiguity. Because a computer system is not context aware, it could grab the wrong word. Additionally, it doesn’t understand the language at all. It just tries to decode words, instead of decoding the meaning. Many languages are not similar at all, and do not have corresponding common words and/or their usage is not the same at all.

But the technology is getting much, much better. Last month in Paris I attended a Microsoft Research event for its Natural Language Processing group and saw the further developments (over last year) on their Machine Translation (MT) project which is focused on creating MT systems and technologies that cater to the multitude of translation scenarios today, including legal. The key is Statistical Machine Translation (SMT) and that breaks down into areas such as syntax-based SMT and phrase-based SMT. Plus there is Word Alignment and Language Modeling technologies.

These advanced language modeling toolkits mean that problems with morphology, syntax, semantics and word sense disambiguation are being solved. Not solved yet, but coming. For the vendors and the multinational companies who need it, the business model is a no brainer. The value of an automated, instant, seamless translation platform to a corporation means the vendor that solves it could charge a substantial amount of money for such a tool.

And credit where credit is due. IBM really started it all decades ago when it set up the first machine translation architectures based on mathematical models called translation models. Generally speaking, a translation model accounts for all of the elementary

operations that rule the process of translation between the different word orderings of the source and target languages. Translation models are usually enriched with statistical parameters, to help drive the search in the space of all valid transformations of the source sentence into the target sentence. IBM developed specialized algorithms to provide for the automatic estimation of these parameters.

Ah, technology. So hard to keep up. Just on the medical front alone the cost of manipulating life’s building blocks is falling at such a rapid clip … faster than Moore’s Law predictions for silicon … that we can now tailor drugs to match a person’s genome, and fuse technology with living matter.

And over there on the mobile/internet/design front:

 

  • Fitbit hires design icon Tory Burch. Intel partners with Barneys. Apple hires Burberry’s Angela Ahrendts. Rumors say Apple is dangling billions in front of cultural trendsetters Jimmy Iovine and Dr Dre. Fashion boasts, fashion beguiles, fashion demands. Value and quality speak softly.
  • Get a drone with a camera. Link it to your Oculus Rift glasses. Experience the world about you in profoundly new and different ways. Now, stream and share all you see and hear – on Facebook, of course. That’s Zuckerberg’s strategy.
  • One app, one task, one screen is a core value of iOS. If the new iPad (due out this coming fall) really allows two apps running on a screen, as rumors suggest, then we immediately know two things: 1) Apple is legitimately nervous about both Samsung and Surface, and 2) Apple intends to launch an assault on the enterprise. Smart and smarter.

 

Lions, and tigers, and bears!! Oh my!

A bevy of posts to come. See you in a week.

Related Posts