生成中...【新闻麻辣烫】
"印度AI方言大作战:22种官方语言+数百种方言,连'按门铃'都要翻译成马拉地语!"外卖小哥维尼特终于不用再猜英文订单了——他的配送APP现在能说6种印度方言。但专家们正头疼:如何让AI既搞定22种官方语言,又不把部落方言"挤到墙角"?更绝的是,孟买理工的教授们正在开发戒烟AI,未来要用22种语言劝你"别抽烟"!(配图:印度外卖小哥盯着手机翻译界面的表情包)
---
**How to get AI to work in 22 languages**
**如何让AI掌握22种语言**
2 days ago Share Save Priti Gupta Technology Reporter Reporting from Mumbai Share Save
两日前 分享 保存 科技记者Priti Gupta孟买报道 分享 保存
Priti Gupta Translation tech has made work easier for Vineet Sawant
科技记者Priti Gupta 翻译技术让外卖员Vineet Sawant工作更轻松
Vineet Sawant has spent the last two years navigating the streets of Mumbai on a scooter as a delivery driver."Being on the road is always very stressful and especially in cities like Mumbai," he says. But when he started out language barriers were an additional problem. His first language is Marathi and Mr Sawant speaks"very little" English."I can understand but it's very difficult to read," he explains. That caused problems at his new job. He said:"At first, it was difficult. Everything was in English, and I can understand some of it, but I'm more comfortable in Marathi. I used to ask other delivery guys to help me figure out what to do." His employer, Zepto, promises"India's Fastest Online Grocery Delivery". So having drivers struggling with delivery instructions was not ideal. To smooth this process a year ago, Zepto partnered with Reverie Language Technologies to introduce an AI translation service for its drivers. Since then its delivery drivers have been able to choose between six languages on the Zepto app."I don't have to guess anymore," says Mr Sawant."Earlier I would take more time to read and sometimes even made mistakes. Now if the customer writes 'ring bell', I get that instruction in Marathi. So, I don't have to ask or check again. It's all clear."
外卖员Vineet Sawant过去两年骑着摩托穿梭在孟买街头。"上路总是压力山大,在孟买这种城市尤其如此,"他说。但刚入行时语言障碍更是雪上加霜。他的母语是马拉地语,英语"只会一点点"。"能听懂但阅读很困难,"他解释道。这给新工作带来麻烦:"起初很艰难,所有信息都是英文的,我只能懂部分内容,用马拉地语才自在。以前总得请教其他骑手。"其雇主Zepto号称提供"印度最快生鲜配送",让骑手为配送指令犯难显然不理想。一年前,Zepto与Reverie语言技术公司合作推出AI翻译服务。此后骑手能在APP上选择六种语言。"再也不用猜了,"Sawant说,"以前读指令费时还老出错。现在顾客写'ring bell',我会收到马拉地语指令,不用反复确认,一目了然。"
Getty Images India has 22 official languages and hundreds of dialects
Getty图片社 印度有22种官方语言和数百种方言
Mr Sawant's difficulties are common."India has 22 official languages and hundreds of dialects," says Professor Pushpak Bhattacharyya, from IIT Mumbai, one of India's leading experts in the use of AI in Indian languages."Without tech, that understands and speaks these languages, millions are excluded from the digital revolution - especially in education, governance, healthcare, and banking," he points out. The rollout of new generative AI systems, like ChatGPT, has made the task more urgent. Vast amounts of data, like web pages, books or video transcripts are used to train an AI. In widely spoken languages like Hindi and English that is relatively easy to get, but for others it is more difficult."The main challenge to create Indian language models is the availability of data. I'm talking about refined data. Coarse quality data, is available. But that data is not of very high quality, it needs filtering," says Professor Bhattacharyya."The issue in India is for many Indian languages, especially tribal and regional dialects, this data simply doesn't exist or is not digitised."
Sawant的困境很普遍。"印度有22种官方语言和数百种方言,"孟买理工学院教授、印度语言AI顶尖专家Pushpak Bhattacharyya指出,"若技术不能理解和使用这些语言,数百万人将被数字革命抛弃——尤其在教育、政务、医疗和金融领域。"ChatGPT等生成式AI的出现让这项任务更紧迫。训练AI需要海量数据(如网页、书籍、视频字幕),印地语和英语等大语种数据较易获取,小语种则困难重重。"创建印度语言模型的主要挑战是数据获取。我说的是精炼数据——粗数据虽有但质量不高需过滤,"Bhattacharyya教授说,"印度许多语言(尤其是部落和地区方言)的数据根本不存在或未数字化。"
Reverie Language Technologies is now deploying its AI-driven translation technology for a range of Indian companies. Co-founder Vivekananda Pani says that while translation technology will make communication easier, there is"potential for less common dialects to be pushed aside"."The challenge will be to make sure that the amazing benefits of AI-driven language advancements don't accidentally shrink the rich variety of human language."
Reverie语言技术公司正为多家印度企业部署AI翻译技术。联合创始人Vivekananda Pani表示,虽然翻译技术让沟通更便捷,但"冷门方言可能被边缘化"。"挑战在于确保AI语言进步的惊人效益不会意外削弱人类语言的丰富性。"
To help tackle the problem Professor Bhattacharyya has contributed to Bhashini, a government project to develop those high quality datasets needed to train an AI. As well as the datasets, Bhashini has built AI language models and translation services in 22 languages. Started in 2022, it's a huge undertaking, but has already made a lot of progress. Bhashini currently hosts 350 AI-based language models that have processed more than a billion tasks. More than 50 government departments work with Bhashini, as well as 25 state governments. For example Bhashini tech is used in multi-lingual chatbots for public services and to translate government schemes into local languages."Bhashini ensures India's linguistic and cultural representation by building India-specific AI models rather than relying on global platforms," says Amitabh Nag, CEO of Digital India, Bhashini Division. He hopes that within the next two or three years rural users will have voice-enabled access to government services, financial tools and information systems in their native languages.
为应对这一问题,Bhattacharyya教授参与了政府项目Bhashini,开发训练AI所需的高质量数据集。除数据集外,Bhashini还构建了22种语言的AI模型和翻译服务。这个2022年启动的庞大项目已取得重大进展:目前托管350个AI语言模型,处理超10亿次任务;50多个中央部门和25个邦政府正在使用,比如多语种政务聊天机器人、政府计划方言翻译等。"Bhashini通过构建印度专属AI模型(而非依赖全球平台),保障印度的语言文化代表性,"数字印度Bhashini部门CEO Amitabh Nag说。他希望未来两三年内,农村用户能用母语语音操作政务服务、金融工具和信息系统。
Getty Images Indian researches are developing an AI to help smokers quit
Getty图片社 印度研究人员正在开发戒烟AI
These India-focused datasets will hopefully one day give people developing AI-based models the tools to make it much easier to adapt them for the entire population. Currently, designing any AI programme to deal with complex processes such as healthcare can be extremely challenging. Kshitij Jadhav, an associate professor at the Koita Centre for Digital Health at IIT Mumbai, is working on an AI programme which would help people quit smoking. He explains that people at different stages of the process need different advice and they usually need a well-trained human to make that assessment. But there are a limited number of practitioners who can help, particularly those that can operate in multiple languages, so Professor Jadhav is hoping his AI model can bridge the gap. The AI"will first identify the kind of conversation the person needs and accordingly will frame questions, show empathy, emotions," says Professor Jadhav. And all that, hopefully, will eventually be done in 22 languages. Initial experiments are underway in English and Hindi."It will be very customized, it will not be something just off the shelf," he says.
这些印度专属数据集有望让AI模型开发者更轻松地适配全民需求。目前设计处理医疗等复杂流程的AI仍极具挑战。孟买理工学院Koita数字健康中心副教授Kshitij Jadhav正在开发戒烟AI。他解释称,戒烟不同阶段需要不同建议,通常需专业人员进行评估,但能提供帮助(尤其是多语种服务)的专业人士有限,因此希望用AI填补空白。"AI会先判断所需对话类型,相应设计问题、展现共情,"Jadhav教授说。这一切最终有望以22种语言实现,目前正进行英语和印地语测试。"这将高度定制化,绝非现成产品。"