Contributing to building a vernacular language translation system at the prestigious IIT Bombay
IIT Bombay main building
IIT Bombay main building (Image source: Quora)
Date
-
Context
  • Internship
Skills
  • Statistical machine learning
  • Code optimisation

Indian Institute of Technology Bombay is one of the most respected institutions in the world, and I feel extremely grateful to have gotten an opportunity to work at a place of which only a few dream of ever seeing the entrance gate.

Background

Statistical Machine Learning is a framework of machine learning that draws data from statistics and functional learning. Unlike machine learning where the primary goal is to make accurate predictions, the goal of statistical machine learning is drawing inferences about the relationship between variables. This technique is therefore a preferred mode of training computer models for language processing and speech recognition where the majority of the data is connected to other parts of data.

Work

The project that I was a part of was involved in translating a corpus written in English language into Hindi. We split the entire corpus into chunks of sensible information and translated individual chunks into Hindi while maintaining the tone of the passage. An example could be given as, "Mr Kennedy was the President of America. He was 6 feet tall." The statement given before would be translated as, "केनेडी जी अमरीका के राष्ट्रपति थे। वे ६ फ़ीट लम्बे थे।"

"A simulation of the system"

In the simulation example shown above, the system automatically understands that the reference to Mr Kennedy is repeated and the noun can be replaced with a pronoun. While this task laid the foundation of how the system should function, it lacked a few things to make it usable:

  • The chunks were not properly cut into sensible pieces
  • The translation was a bit wonky at times
  • The UI was not that user-friendly

This is where my role came in. I was involved in optimising the entire system by properly separating the corpus into sensible chunks and improving the UI. A majority of my time was spent trying to understand the flow of code and tweaking the logic to improve the chunking methods.

Closing remarks

While working at IIT Bombay, I realised that the biggest systems in the world are built by sheer hard work and consistency, and the whole experience taught me a valuable lesson in handling pressure. I am forever grateful for that.

In case you are interested in learning more about the system, here is the explanation of how the system functions.


Acknowledgements

Vishwajeet for teaching me the do's and don'ts of leadership and management

Next story
Laksh Solar