Ankur.org.in
  • Home
  • Communication
  • People
  • Projects
    • GNOME Translation
    • KDE Translation
    • LXDE Translation
    • Mozilla Translation
      • Thunderbird Translation Tracker
    • Project Ideas
      • GSoC 2012
    • Terminology
      • Applications
        • Audio & Video Players
        • Image Editors
        • Image Viewer
        • Instant Messaging
        • Mail Client – Addressbook
        • Mail Client – Calendar
        • Mail Client – Mailing
        • Office Suite – Presentation
        • Office Suite – Spreadsheets
        • Office Suite – Wordprocessors
        • Videos & Animation
        • Web Browsers
      • Colours
      • Country Names
      • Emoticons
      • Hardware & Peripherals
      • Language Names
      • Networking
      • Partitioning & Storage
      • Printing
      • Security
      • System Performance
      • Text Formatting
      • Time Zones
      • Top-Level Menu
      • User Interface Widgets
    • Tools
    • XFCE Translation
  • Resources

Project Ideas



Ankur.org.in would like to welcome all students, developers and mentors to participate in the Google Summer of Code 2012[1]. Ankur.org.in will be putting forth an application to be a mentoring organization. The project proposal is an important component. Your application will be judged using it. As a rule of thumb, a clearly articulated proposal that sets out a plan of action along with time-lines and, is realistic in providing checkpoints to measure progress is something that projects and mentors look forward to. The application template which you will be provided will have the following points:

  • self introduction (courses attending etc)
  • what is the scope of the proposal (especially what is outside the scope)
  • familiarity with the tools, infrastructure and concepts of the project
  • how many hours per work will you be able to commit to your project
  • do you want to inform the project about any other commitments you have
  • how will you adjust to a mentor who will be virtually present
  • are you comfortable in English

Through its involvement in the Google Summer of Code Program Ankur.org.in aims to achieve the some of the goals it has set itself for this year. These relate to the availability of simpler and reliable tools for end-users of platforms to communicate and share knowledge in their local language. The project proposals are so constructed so as to enable architecture and development of ‘frameworks’ rather than language specific tools. This would enable a larger part of the work to be re-used and improved upon by contributions from other language communities and especially, Indic language communities. Given below are a set of ideas around which we would like to see proposals. These are by no means the only ideas which we will consider. An interesting approach to solve a relevant problem in the domain of language technologies is always welcome and we would encourage you to discuss that using our mailing list or, IRC. The earlier we begin the conversation the easier it will be to familiarize ourselves with the patterns and way of work. This would help us in ensuring that the project ideas selected are successfully delivered.

Our mailing list is project-ideas @ lists dot ankur dot org dot in The subscription interface is available from http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in We also use IRC for discussion and our channel is #ankur.org.in on Freenode IRC If you are interested in any of the ideas, please do get in touch with the mentors (their IRC nicknames are provided along side their names)

An application UI testing framework for validating translation completeness and quality

Mentor: Runa Bhattacharjee (arrbee) A typical problem in the translation work-flow is the incomplete coverage of translated strings for an application. This creates an inconsistent experience for the end user. The basic premise of this project idea is to allow a desktop application testing tools/framework to check for consistency of translations across a GUI, coverage as well as whether the translated UI contravenes known UI Guidelines. A knowledge of available testing frameworks and, ability to develop code in scripting languages viz. Python is required. This is an infrastructure/automation centric project which will help improve the quality and consistency of translated interfaces. There are existing tools/scripts which do parts of the outlined idea. However, there is a lack of an unified approach to the solution.

A validation system for translated strings based upon Translation Style Guides of language communities.

Mentor: Runa Bhattacharjee (arrbee) Every language community creates a set of Style Guides for translations. These pertain to the specific ways in which translatable aspects like Trademarks, Shortcuts, Hot-keys, accelerators etc are translated. The proposal intends to develop a set of validation methods which would accept a Style Guide as an input and thereafter test a corpus of translated files and generate a result which can also be used to score the quality of a translation. A knowledge of Style Guides is preferable. Ability to develop code in scripting languages or, web-frameworks is required. This is an infrastructure project aimed at someone who would like to begin contributing towards i18n/l10n development. There is adequate guidelines available in the form of styleguides. The intent of this project is to convert such guidelines into a scoring mechanism which will enable teams to reach conclusions on the quality of the translations. Familiarity with plug-in systems of existing translation content management systems is preferable but by no means mandatory.

A translation editor for DTD resources

Mentor: Shreyank Gupta(shrink) Translating DTD resource files (e.g used in Mozilla application translation), and XML files in a translation editor often requires that they be converted into a suitable file format (like .PO) first. Or, to be edited in a simple text editor. This results in a non-scalable workflow for the translators who lack the source language text and references that are essential. This project aims to achieve a suitable interface for DTD and XML files, where the translatable strings and the translation space would be simultaneously presented. This blog post aims to provide answers to a few questions one may have

A Glossary Tool to index all available localizable strings for Bengali across various FOSS projects and, other available corpus

Mentor: Runa Bhattacharjee (arrbee) Enhanced consistency and high quality of translations is the aim of any language community. This project proposal envisages a Glossary Tool which can generate a reference standard for ensuring consistency of translations along with creating a self-learning glossary. This is an infrastructure project aimed at someone who would like to begin contributing towards i18n/l10n development. The project phase would include study of the various glossaries available. Additionally, assessing any existing tool which provides similar, if reduced, functionality is also expected. There is no constraint about the requirement of the application/tool being a desktop application and hence the interested student can provide a proposal according to their strengths.

New Visual Keyboard for Bengali

Mentor: Runa Bhattacharjee (arrbee) The popular keyboard layouts in use for Bengali follow a non-visual style of typing i.e. the characters are not typed in the sequence they are displayed. The non-visual style follows a uniform method of typing the characters as per their type (consonants, independent vowels, dependent vowels, special characters, conjunct characters) and are defined by specifice rules. This method of writing is already prevalent. However, this often poses a learning challenge for new users who are more practised in the conventional visual writing method. Visual typing methods for complex scripts like Bengali are challenging to create. Especially with corner-cases like spilt vowels. However they have been done before for a few other complex scripts.

New keyboard layout specific to input on mobile devices

Mentor: Sankarshan Mukhopadhyay (_sankarshan) With the availability of Android 4.x, the support for Indic languages on mobile devices is available. The proposal involves creating a keyboard layout for easy and efficient entry of Bengali text on such devices. The layout should also enable gathering of user feedback and usage model so as to allow improvements. This is an exploratory project and would require a higher degree of self-motivation along with an interest in quantitative analysis of data from the interested student. There are numerous keyboard layouts and input methods available. A prior knowledge of keyboard layouts is preferred as it would help provide the baseline in making decisions around how to arrange the keys in the constrained UI of a mobile device. An indicator of a reasonable quantum of success in the project is the quick uptake and adoption of the layout and ease-of-creation of layouts of any language thus leading to popularity.

Improve the accuracy of OCR tools for Bengali language to 98%

Mentor: Sankarshan Mukhopadhyay (_sankarshan) Existing Free and Open Source software around OCR result in significantly high erroneous result. The proposal requires a study of the currently open items for any existing tool and, develop patches which would improve the accuracy of the software to ~98% This is a risky/exploratory project. Current methods of OCR for Indic languages have attained a sort of plateau. The intent of this project is to devise technology constructs which will help improve the accuracy. Attaining such goal would require prior knowledge about the existing tools and programs along with a depth of understanding of the current problem sets.

Improving information retrieval methods for OCR data sets consisting of Indic scripts

Mentor: Sankarshan Mukhopadhyay (_sankarshan) The availability of archived and digitized documents for Indic scripts has gradually increased in recent times. The project aims to improve existing methods and algorithms in the retrieval of information from digitized text. The ability to increase the effectiveness of information retrieval from such text enables the content to be made available via standardized and structured text processing software. Current methods of retrieval result in significant degradation thus making information retrieval and use ineffective. As part of the proposal, the search algorithms should make use of all additional methods of error corrections to improve the performance. This is an exploratory/research centric project.

Improving models for Cross Language Text Re-use

Mentor: Sankarshan Mukhopadhyay (_sankarshan) Improving existing models for Cross Language Text Re-use enables detection of source documents and is a path towards suggesting improvements in translation quality. The current paths towards detection of Cross Language Text Re-use have specific limitations and, the proposal desires to arrive at a better model in order to specifically and accurataly identify source documentation in order to enable scoring of the translated text. This is an exploratory/research centric project. However, the need to have such a system in place that provides a high degree of accuracy in identifying candidate texts has been felt for a while. The interested candidate should be able to provide a technological breakthrough that leads to an implementation and, adequate datasets would be used to arrive at an acceptable benchmark for accuracy.

Design and development of a print ready OpenType font for Bengali

Mentor: Sayamindu Dasgupta (sayamindu) The set of available fonts for the Bengali language are not specifically developed with the aim of being used in printed content. The proposal requires the design and development of a OpenType font for Bengali which is suitable to be used in printed content, has aesthetic appeal and, is compliant with the current specifications mandated by the Unicode Consortium. The specific set of test cases for the font would also be required to be developed. A prior knowledge of font design/development, familiarity with the specifications and discussions of the Unicode consortium and, familiarity with font development tools on Linux is preferred. This is core development work as part of the organization’s focus area.

Develop a system with multi-lingual capabilities in order to receive answer to user specific queries.

Mentor: Sankarshan Mukhopadhyay (_sankarshan) FAQ based systems which provide responses to user specific questions are fairly common and popular. However, the current systems are mostly limited to handling the question-answer pairs in English. Additionally, typical usage of query text introduce errors into the flow thus increasing the complexity of a successful transaction. This proposal is limited to a pair language system whereby the question/answers could be in English/Bengali. The system should be able to determine the question from a set of available questions and deliver the appropriate response to the user. A highly accurate system would enable creation of FAQ like content for various services using language modules and, this framework can then be enabled to provide a seamless reading experience to the users.

Add language grammar rules to a machine translation system

Mentor: Sayamindu Dasgupta (sayamindu) Existing Machine Translation systems perform with less accuracy when provided with the task of translating from English to Bengali. The proposal involves documenting existing language grammar rules and developing enhancements to an existing system of machine translation

Add a language model for speech recognition software for Bengali language

Mentor: Sayamindu Dasgupta (sayamindu) Develop a language model for speech processing by extending a freely available corpus.


See Also:
GSoC 2012 – Ankur-India

  • Share this:
  • StumbleUpon
Copyright © 2005‒2011 Ankur.org.in.
Free to share and remix with attribution: Creative Commons CC-BY-SA.

EvoLve theme by Blogatize  •  Powered by WordPress Ankur.org.in
Bengali in Free and Open Source Software projects