Handwritten Word Spotting in Indic Scripts using

8 downloads 0 Views 671KB Size Report
Word spotting with Query By Example (QBE) principle ... (QBE) or image template matching [10] was adopted by ... The grid has Ͷܰ individual cells at N.
2015 3rd IAPR Asian Conference on Pattern Recognition

+DQGZULWWHQ:RUG6SRWWLQJLQ,QGLF6FULSWVXVLQJ)RUHJURXQGDQG %DFNJURXQG,QIRUPDWLRQ  $\DQ'DV $\DQ.XPDU%KXQLD 'HSWRI(&( 'HSWRI(&( ,(0.RONDWD,QGLD ,(0.RONDWD,QGLD GDVD\DQLHP#JPDLOFRP D\DQEKXQLD#JPDLOFRP 

8PDSDGD3DO &9358QLW,6, .RONDWD,QGLD XPDSDGD#LVLFDODFLQ

  VSRWWLQJ V\VWHP LQ VHJPHQWHG WH[W OLQHV XVLQJ +LGGHQ 0DUNRY 0RGHOV :H SURSRVH KHUH D QRYHO DSSURDFK RI FRPELQLQJIRUHJURXQGDQGEDFNJURXQGLQIRUPDWLRQRIWH[W LPDJHV IRU NH\ZRUGVSRWWLQJ E\ FKDUDFWHU ILOOHU PRGHOV 7KH FDQGLGDWH NH\ZRUGV DUH VHDUFKHG LQ OLQH ZLWKRXW VHJPHQWLQJ FKDUDFWHU RU ZRUGV $ VLJQLILFDQW LPSURYHPHQW LQ SHUIRUPDQFH LV QRWHG E\ XVLQJ ERWK IRUHJURXQG DQG EDFNJURXQG LQIRUPDWLRQ WKDQ DQ\RQH DORQH 7KH IUDPHZRUN LV DSSOLHG WR ,QGLF VFULSWV VXFK DV %HQJDOL DQG 'HYDQDJDUL DORQJ ZLWK /DWLQ VFULSW IRU HYDOXDWLRQ 

  $EVWUDFW  In this paper we present a line based word spotting system based on Hidden Markov Model for offline Indic scripts such as Bangla (Bengali) and Devanagari. We propose a novel approach of combining foreground and background information of text line images for keywordspotting by character filler models. The candidate keywords are searched from a line without segmenting character or words. A significant improvement in performance is noted by using both foreground and background information than anyone alone. Pyramid Histogram of Oriented Gradient (PHOG) feature has been used in our word spotting framework and it outperforms other existing features of word spotting. The framework of combining foreground and background information has been evaluated in IAM dataset (English script) to show the robustness of the proposed approach.   ,QWURGXFWLRQ +DQGZULWWHQ WH[W UHFRJQLWLRQ LV VWLOO RQH RI WKH FKDOOHQJLQJ SUREOHPV LQ WKH ILHOG RI SDWWHUQ UHFRJQLWLRQ 'XH WR WKH IUHHIORZ QDWXUH RI KDQGZULWLQJ DQG PDQ\ ZULWLQJ YDULDWLRQV WKH UHFRJQLWLRQ SHUIRUPDQFH LV QRW VDWLVIDFWRU\ HYHQ ZLWK VRSKLVWLFDWHG SUHSURFHVVLQJ DQG 2&5 WHFKQLTXHV $ VSHFLDO IRUP RI ZRUG UHFRJQLWLRQ WHFKQLTXH WKH VR FDOOHG ³:RUG 6SRWWLQJ´ KDV EHHQ SURSRVHG >@ WKDW GRHV QRW UHTXLUH 2&5 RI WKH HQWLUH GRFXPHQW:RUGVSRWWLQJKDVEHHQH[WHQVLYHO\VWXGLHG> @WRGHWHFWDZRUGLQDKDQGZULWWHQGRFXPHQWSDJH RUOLQH DVSHUWKHXVHU¶VTXHU\NH\ZRUG>@RUDWHPSODWH LPDJH>@7KLVVHDUFKLQJRUEURZVLQJDSSURDFKLQD IDVW ZD\ RIWHQ RYHUFRPHV WKH SUREOHP RI FRQYHQWLRQDO UHFRJQLWLRQ 7H[W VHDUFK XVLQJ ZRUG VSRWWLQJ WHFKQLTXHV >@ SURYLGHV DQ DOWHUQDWLYH DSSURDFK IRU LQGH[LQJ DQG UHWULHYDO $V D UHVXOW LW KDV EHHQ SRSXODU LQ H[WUDFWLQJ LQIRUPDWLRQ IURP KLVWRULFDO GRFXPHQWV KDQGZULWWHQ IRUPVHWF :RUGVSRWWLQJZLWK4XHU\%\([DPSOH 4%( SULQFLSOH WDNHV LQVWDQFHV RI TXHU\ ZRUG LPDJH IRU VHDUFKLQJ :KHUHDV4XHU\%\7H[W 4%7 >@ZKLFKXVHVOHDUQLQJ EDVHG DSSURDFK IRU UHWULHYDO SURYHG PRUH HIIHFWLYH UHFHQWO\7KLVSDSHUSUHVHQWVD4XHU\%\7H[WEDVHGZRUG 978-1-4799-6100-9/15/$31.00 ©2015 IEEE

3DUWKD3UDWLP5R\ 'HSWRI&6( ,,75RRUNHH,QGLD SUR\IFV#LLWUDFLQ

)LJ ([DPSOHV RIZRUG VSRWWLQJLQ %DQJODDQG 'HYDQDJDUL VFULSW )LUVW WZR %HQJDOL OLQHV ZHUH VHDUFKHG ZLWK WKH NH\ZRUG ³ïáĘƣč´ DQG ODVW WZR 'HYDQDJDUL OLQHV ZHUH VHDUFKHG ZLWK WKH NH\ZRUG ͨ›ȡ•

 5HODWHG:RUN +DQGZULWWHQ ZRUG VSRWWLQJ >@ LV WUDGLWLRQDOO\ YLHZHG DVDQLPDJHPDWFKLQJWDVNEHWZHHQRQHRUPXOWLSOHTXHU\ ZRUG LPDJHV DQG D VHW RI FDQGLGDWH ZRUG LPDJHV LQ D GDWDEDVH > @ 7KH WHFKQLTXHV RI TXHU\ E\ H[DPSOH 4%(  RU LPDJH WHPSODWH PDWFKLQJ >@ ZDV DGRSWHG E\ UHVHDUFKHUV LQ WKH HDUO\ GD\V RI ZRUG VSRWWLQJ 7KH PRGHUQ DSSURDFK QDPHO\ TXHU\ E\ WH[W 4%7  RU WKH OHDUQLQJEDVHGDSSURDFK>@ZKLFKRXWSHUIRUPHGWKH ROGHU RQH LV EHLQJ H[WHQVLYHO\ XVHG LQ UHFHQW V\VWHPV DOVR6RPHZRUNH[LVWVLQZKLFKFKDUDFWHUWHPSODWHEDVHG >@ VSRWWLQJ KDV EHHQ FRQVLGHUHG ZKHUHDV RWKHUV GHSLFW VSRWWLQJ DW ZRUG OHYHO 6HYHUDO ZRUNV H[LVW WRZDUGV WKH DSSOLFDWLRQ RI ZRUG VSRWWLQJ VXFK DV NH\ZRUG ILQGLQJ LQ KLVWRULFDO GRFXPHQWV > @ VHDUFKLQJ DQG EURZVLQJ WKURXJK D GLJLWL]HG GRFXPHQW HWF $ VFULSW LQGHSHQGHQW ZRUGVSRWWLQJPHWKRGKDVDOVREHHQSURSRVHGUHFHQWO\>@ ,Q UHWULHYDO RI LPSRUWDQW LQIRUPDWLRQ IURP SRRUO\ ZULWWHQ ROG GRFXPHQWV >@ ZRUG VSRWWLQJ KDV EHHQ FRQVLGHUHG 6HYHUDOORFDOIHDWXUHVKDYHEHHQXVHGIRUDFKLHYLQJEHWWHU 426

SHUIRUPDQFH DPRQJ ZKLFK VRPH RXWSHUIRUPHG RWKHUV LQ FRQMXQFWLRQ ZLWK '\QDPLF 7LPH :UDSSLQJ '7:  ,Q DQRWKHUNLQGRIDSSURDFKNH\ZRUGVSRWWLQJKDVEHHQGRQH DW FKDUDFWHU OHYHO XVLQJ %/67011 %LGLUHFWLRQDO ORQJ VKRUWWHUP PHPRU\ QHXUDO QHWZRUN  >@ 7KHUH H[LVWV VHYHUDO SDJH OHYHO VHJPHQWDWLRQ IUHH WHFKQLTXHV ZKLFK XVHV VFDOH LQYDULDQW IHDWXUHV LH 6,)7  >@ 5HFHQWO\ )LVFKHU HW DO >@ KDV GHVFULEHG WKH ZRUG VSRWWLQJ SHUIRUPDQFH XVLQJ FKDUDFWHU ILOOHU PRGHO XVLQJ 0DUWL %XQNHIHDWXUH 7KH FRQWULEXWLRQV RI WKLV SDSHU DUH WKH IROORZLQJ   $ XQLTXH IHDWXUH H[WUDFWLRQ PHWKRG IRU ZRUG VSRWWLQJ KDV EHHQ SHUIRUPHG XVLQJ FRPELQDWLRQ RI IRUHJURXQG DQG EDFNJURXQG LQIRUPDWLRQ   WKH IUDPH ZRUN IRU ZRUG VSRWWLQJKDVEHHQDQDO\]HGLQ,QGLFVFULSWVQDPHO\%DQJOD DQG 'HYDQDJDUL   7KH V\VWHP KDV EHHQ WHVWHG LQ ,$0 GDWDVHW RI (QJOLVK WR HQVXUH WKH UREXVWQHVV RI RXU DSSURDFK$FRPSDUDWLYHVWXG\EHWZHHQ3+2*DQG/*+ IHDWXUHKDVEHHQSHUIRUPHGWRHYDOXDWHWKHLUSHUIRUPDQFH LQZRUGVSRWWLQJIRU,QGLFVFULSW 7KH UHVW RI WKH SDSHU LV RUJDQL]HG DV IROORZV 7KH ZRUG VSRWWLQJ IUDPHZRUN LV H[SODLQHG LQ GHWDLOV LQ 6HFWLRQ ,, :H KDYH GHPRQVWUDWHG WKH SHUIRUPDQFH RI RXU QRYHO IHDWXUHH[WUDFWLRQIRUZRUGVSRWWLQJLQ6HFWLRQ,,,)LQDOO\ FRQFOXVLRQVDQGIXWXUHZRUNDUHSUHVHQWHG   3URSRVHG$SSURDFKRQ:RUG6SRWWLQJ

ORFDO VKDSH FRPSULVLQJ RI JUDGLHQW RULHQWDWLRQ DW HDFK S\UDPLGUHVROXWLRQOHYHO7RH[WUDFWWKHIHDWXUHIURPHDFK VOLGLQJ ZLQGRZ ZH KDYH GLYLGHG LW LQWR FHOOV DW VHYHUDO S\UDPLG OHYHO 7KH JULG KDVͶܰ  LQGLYLGXDO FHOOV DW 1 UHVROXWLRQ OHYHO LH 1    +LVWRJUDP RI JUDGLHQW RULHQWDWLRQ RI HDFK SL[HO LV FDOFXODWHG IURP WKHVH LQGLYLGXDO FHOOV DQG LV TXDQWL]HG LQWR / ELQV (DFK ELQ LQGLFDWHV D SDUWLFXODU RFWDQW LQ WKH DQJXODU UDGLDQ VSDFH7KH FRQFDWHQDWLRQ RI DOO IHDWXUH YHFWRUV DW HDFK S\UDPLG UHVROXWLRQ OHYHO JLYHV  GLPHQVLRQDO IHDWXUH YHFWRUVFRQVLGHULQJELQVDQGOLPLWLQJWKHOHYHOWR1 LQ RXULPSOHPHQWDWLRQ 

 7KH PDMRU JRDO RI ZRUG VSRWWLQJ LV WR GHWHFW VSHFLILF NH\ZRUG LQ D SRRO RI GRFXPHQW LPDJHV 2XU V\VWHP LV DEOH WR VHDUFK DUELWUDU\ ZRUGV LQ WKH WH[W OLQHV )RU WKLV SXUSRVH WKH GRFXPHQW LPDJH LV ILUVW ELQDUL]HG ZLWK D JOREDOELQDUL]DWLRQPHWKRG1H[WWKHELQDU\GRFXPHQWLV VHJPHQWHG LQWR LQGLYLGXDO WH[W OLQHV XVLQJ D OLQH VHJPHQWDWLRQ DOJRULWKP >@ )RU VNHZFRUUHFWLRQ ZH FRQVLGHU DOO WKH SRLQWV DW WKH H[WUHPH ERWWRP RI WKH WH[W VWURNHDQGXVHLinear RegressionDQDO\VLVRQWKHVHSRLQWV WR ILQG RXW WKH EHVW ILWWHG OLQH 7KH VORSH RI WKH VWUDLJKW OLQHį UHSUHVHQWVVNHZRIWKHWH[W7KHUHDIWHUDURWDWLRQE\ į LVGRQHWRFRUUHFWWKHVNHZ$IWHUVNHZFRUUHFWLRQHDFK WH[W OLQH LV QRUPDOL]HG WR FRSH XS ZLWK GLIIHUHQW KDQGZULWLQJVW\OH

)LJ3URSRVHGZRUGVSRWWLQJIUDPHZRUN



)RUFDOFXODWLQJEDFNJURXQGLQIRUPDWLRQZHWDNHFDUHRI WKHPRUSKRORJ\RIFKDUDFWHUVHWLQ%DQJODDQG'HYDQDJDUL VFULSWV ,Q %DQJOD RU 'HYDQDJDUL VFULSW LW LV QRWHG WKDW PRVWRIWKHFKDUDFWHUVKDYHDKRUL]RQWDOOLQH 6KLURUHNKD  DWWKHXSSHUSDUW:KHQWZRRUPRUHFKDUDFWHUVVLWVLGHE\ VLGHWRIRUPDZRUGWKHKRUL]RQWDOOLQHVRIWKHFKDUDFWHUV WRXFK DQG JHQHUDWH D ORQJ OLQH FDOOHG KHDGOLQH %HFDXVH RIVXFKWRXFKLQJQDWXUHWKHFKDUDFWHUVLQDZRUGFUHDWHELJ ZKLWH UHJLRQV VSDFHV  LQ %DQJOD RU 'HYDQDJDUL VFULSWV 7KHVHHPSW\VSDFHVDUHIRXQGE\ZDWHUUHVHUYRLUSULQFLSOH >@ )RU HDFK SDLU RI MRLQLQJ FKDUDFWHUV ZH ZLOO JHW XQLTXH UHVHUYRLU IRUPDWLRQ WKHVH UHVHUYRLUV FRQWDLQ LQIRUPDWLRQ DERXW WKH FRPELQDWLRQ RI FKDUDFWHUV IRUPLQJ WKH ZRUG,Q)LJWKH IRUPDWLRQRIERWWRPUHVHUYRLUVDUH VKRZQIRU'HYDQDJDULDQG%DQJODWH[WOLQHUHVSHFWLYHO\ 

)LJ SURYLGHV WKH JUDSKLFDO GHVFULSWLRQ RI WKH ZRUG VSRWWLQJIUDPHZRUNZKHUHFRQFDWHQDWHGIHDWXUHVDUHIHGWR +00 :RUG VSRWWLQJ LV EHLQJ SHUIRUPHG XVLQJ WH[W OLQH VFRULQJ EDVHG RQ WKH ILOOHU DQG FKDUDFWHU PRGHO RI +00)RUWKHZRUGVSRWWLQJV\VWHPZHKDYHXVHGDQRYHO IHDWXUHH[WUDFWLRQWHFKQLTXH&RQFDWHQDWLRQRIIRUHJURXQG IHDWXUHDQGEDFNJURXQGIHDWXUHVDUHFRQVLGHUHGKHUH7KH GHWDLOVRIHDFKVWHSDUHGHVFULEHGEHORZ   )HDWXUHH[WUDFWLRQ )HDWXUH LV D UHSUHVHQWDWLRQ RI DQ LPDJH ZKLFK LV PRUH GLVFULPLQDWLYH WKDQ WKH LPDJH 3+2* IHDWXUH KDV EHHQ IRXQGWRSURYLGHEHWWHUUHVXOWLQ%DQJODKDQGZULWWHQVFULSW UHFRJQLWLRQ>@3+2*>@LVWKHVSDWLDOVKDSHGHVFULSWRU ZKLFKJLYHVWKHIHDWXUHRIWKHLPDJHE\VSDWLDOOD\RXWDQG

 )LJ:DWHU5HVHUYRLUIRUPDWLRQLQ D 'HYDQDJDULDQG E %DQJOD WH[W OLQH LPDJH DQG SRVLWLRQ RI VOLGLQJ ZLQGRZ LV PDUNHG LQ UHG FRORU 427

:H KDYH FDOFXODWHG3+2* IHDWXUHIURPIRUHJURXQGDV ZHOO DV EDFNJURXQG UHJLRQV IRUPHG E\ WKH UHVHUYRLU 7KHVH IHDWXUHV DUH WKHQ FRQFDWHQDWHGIRU WKH ILQDO IHDWXUH IURP WKH WH[W OLQH LPDJH $Q LOOXVWUDWLRQ RI IHDWXUH H[WUDFWLRQWHFKQLTXHLVJLYHQLQ)LJ 

 )LJ )HDWXUH H[WUDFWLRQ PHWKRG VKRZQ JUDSKLFDOO\ 7KH IHDWXUHV DUHH[WUDFWHGIURPWKHVOLGLQJZLQGRZPDUNHGLQUHGFRORU

 7H[WOLQHVFRULQJ :RUG VSRWWLQJ PHFKDQLVP LV EDVHG RQ WKH VFRULQJ RI WH[W LPDJH ;  IRU WKH NH\ZRUG :  ,I WKH VFRUH YDOXH LV JUHDWHU WKDQ D FHUWDLQ WKUHVKROG WKHQ LW JLYHV D SRVLWLYH YDOXHIRUWKHRFFXUUHQFHRIWKDWSDUWLFXODUNH\ZRUGLQWKDW WH[W OLQH7KH VFRUH DVVLJQHG WR WKH WH[W OLQH LPDJH ; IRU WKH NH\ZRUG : LV EDVHG RQ WKH SRVWHULRU SUREDELOLW\ 3 :M_;DE  WUDLQHG RQ NH\ZRUG PRGHOV:KHUH D DQG E FRUUHVSRQGWRVWDUWLQJDQGHQGLQJSRVLWLRQRIWKHNH\ZRUG ZKHUHDV ;DE JLYHV WKH SDUWLFXODU SDUW RI WH[W OLQH FRQWDLQLQJWKHNH\ZRUG>@$SSO\LQJ%D\HV¶UXOHZHJHW Ž‘‰ ‫݌‬ሺܹȁܺܽǡܾ ሻ ൌ Ž‘‰ ‫݌‬ሺܺܽǡܾ ȁܹሻ ൅ Ž‘‰ ‫݌‬ሺܹሻ െ Ž‘‰ ‫݌‬ሺܺܽǡܾ ሻ &RQVLGHULQJ HTXDO SUREDELOLW\ ZH FDQ LJQRUH WKH WHUP Ž‘‰ ‫݌‬ሺܹሻ 7KH WHUP Ž‘‰ ‫݌‬ሺܺܽ ǡܾ ȁܹሻ UHSUHVHQWV WKH NH\ZRUG WH[W OLQH PRGHO ZKHUH LW LV DVVXPHG WKDW H[DFW FKDUDFWHUVHTXHQFHRIWKHNH\ZRUGWREHSUHVHQWVHSDUDWHG E\µ6SDFH¶7KHUHVWSDUWRIWKHWH[WOLQHLVPRGHOHGZLWK )LOOHUWH[WOLQH PRGHO7KHQZHFDQILQGWKHSRVLWLRQDE IRU WKH NH\ZRUG DORQJVLGH ZLWK WKH ORJ OLNHOLKRRGŽ‘‰ ‫݌‬ሺܺܽǡܾ ȁܹሻ ൌ  Ž‘‰ ‫݌‬ሺܺܽǡܾ ȁ‫ܭ‬ሻ 

 +LGGHQ0DUNRY0RGHO  ,Q WKH ILHOG RI KDQGZULWWHQ WH[W UHFRJQLWLRQ +LGGHQ 0DUNRY0RGHOVKDYHEHHQH[WHQVLYHO\XVHGEHFDXVHRILWV SHFXOLDU QDWXUH RI EHLQJ HIILFLHQW DW UHFRJQLWLRQ LQ WKH FDVHV RI WRXFKLQJ FKDUDFWHUV GLVWRUWHG FKDUDFWHUV HYHQ ZLWKRXW EHLQJ SURSHUO\ SUHSURFHVVHG >@ 7KH VLPSOHVW PRGHO LV WKH FKDUDFWHU +00 ZKLFK FRQVLVWV RI - KLGGHQ VWDWHV 666- LQDOLQHDUWRSRORJ\DVDQREVHUYDWLRQ 2 ZKHUH LWK REVHUYDWLRQ 2L  UHSUHVHQWV DQ QGLPHQVLRQDO IHDWXUHYHFWRU[PRGHOHGXVLQJD*DXVVLDQ0L[WXUH0RGHO *00 ZLWKSUREDELOLW\݆ܲܵ ሺ࢞ሻM-JLYHQE\

D 



E  )LJ D )LOOHU0RGHODQG E .H\ZRUG0RGHO



 Ž‘‰ ‫݌‬ሺܺܽǡܾ ሻ LV WKH XQFRQVWUDLQHG ILOOHU PRGHO )7KH JHQHUDO FRQIRUPDQFH RI WKH WH[WLPDJH WR WKH WUDLQHG FKDUDFWHU PRGHOV LV JLYHQ E\ REWDLQHG ORJOLNHOLKRRG Ž‘‰ ‫݌‬ሺܺܽǡܾ ሻ ൌ  Ž‘‰ ‫݌‬൫ܺܽǡܾ ȁ‫ܨ‬൯7KH GLIIHUHQFH EHWZHHQ WKH ORJOLNHOLKRRGYDOXHRINH\ZRUGPRGHODQGILOOHUPRGHOLV QRUPDOL]HG ZLWK UHVSHFW WR WKH OHQJWK RI WKH ZRUG WR JHW WKHILQDOWH[WOLQHVFRUH ሾŽ‘‰’ሺƒǡ „ȁሻ െ Ž‘‰’ሺƒǡ „ȁ ሻሿ  ܵܿ‫݁ݎ݋‬ሺܺǡ ܹሻ ൌ ܾെܽ 7KHQ WKLV ܵܿ‫݁ݎ݋‬ሺܺǡ ܹሻ LV FRPSDUHG ZLWK D FHUWDLQ WKUHVKROGIRUZRUGVSRWWLQJ



ࡼࡿ࢐ ሺ࢞ሻ ൌ  ෍ ࢃ࢐࢑ ࡺሺ࢞ȁ࢐࢑ǡ ࢳ࢐࢑ ሻ ࢑ൌ૚

:KHUH * LV WKH QXPEHU RI *DXVVLDQV DQGNUHIHUV WR D PXOWLYDULDWH *DXVVLDQ GLVWULEXWLRQ ZLWK PHDQ ࢐࢑  FRYDULDQFHPDWUL[ࢳ࢐࢑ DQGSUREDELOLW\:MN IRUNWK*DXVVLDQ LQVWDWHM )RUWUDLQLQJWKHPRGHOILUVWO\IHDWXUHYHFWRUV GLIIHUHQW IHDWXUHV KDYH EHHQ FRQVLGHUHG VHSDUDWHO\  KDYH EHHQ H[WUDFWHG IURP ODEHOHG WH[W OLQH LPDJHV ZLWK PXOWLSOH ZRUGV7KHSUREDELOLW\RIWKHFKDUDFWHUPRGHORIWKHWH[W OLQH LV WKHQ PD[LPL]HG E\ %DXP:HOFK DOJRULWKP DVVXPLQJ DQ LQLWLDO RXWSXW DQG WUDQVLWLRQDO SUREDELOLWLHV8VLQJ WKH FKDUDFWHU +00 PRGHOV D ILOOHU PRGHO KDV EHHQ FUHDWHG ZKLFK LV VKRZQ LQ )LJ D  )LJ E  VKRZV WKH NH\ZRUG PRGHO ZKLFK KDV EHHQ XVHG LQRXUV\VWHPWRVSRWDNH\ZRUGLQDWH[WOLQHLPDJH7KH ILOOHUPRGHOUHSUHVHQWVDVLQJOHFKDUDFWHUPRGHOFRQVLVWLQJ RI DQ\ FKDUDFWHUV DPRQJ µ&KDU L¶V ZKHUH  ” L ”1 VHH )LJ D $µ6SDFH¶PRGHOKDVEHHQXVHGLQWKHNH\ZRUG PRGHO VKRZQ LQ )LJ  E  ZKLFK LV DFFRXQWHG IRU PRGHOLQJZKLWHVSDFHV 

 ([SHULPHQW5HVXOWV :H KDYH FROOHFWHG GRFXPHQW LPDJHV ZULWWHQ E\ GLIIHUHQW SURIHVVLRQDO 7KH LQSXW RI RXU V\VWHP FDQ EH HLWKHU DUELWUDU\ NH\ZRUG VWULQJ RU WH[W OLQH LPDJH :H KDYH FROOHFWHG ZRUG LPDJHV RI GLIIHUHQW ZULWHU IRU ERWK %DQJODDQG'HYDQDJDULVFULSW7KHQZH KDYH JHQHUDWHGD WRWDORIOLQHLPDJHVIRU%DQJODDQGOLQHLPDJHV IRU'HYDQDJDULERWKFRQWDLQLQJWZRWRVL[ZRUGLPDJHVLQ D OLQH  :H KDYH DOVR XVHG ,$0 (QJOLVK  GDWDVHW 7KH GHWDLOVRIGDWDXVHGLQRXUH[SHULPHQWDUHVKRZQLQ7DEOH 6RPH RIWKH NH\ZRUGV XVHGIRU ZRUGVSRWWLQJLQ WKUHH 428

VFULSWV DUH VKRZQ LQ 7DEOH  6RPH TXDOLWDWLYH UHVXOWV LQ WKUHH VFULSWV DUH VKRZQ LQ )LJ 1RWH WKDW WKH V\VWHP LV HIILFLHQW ZLWK WKH KDQGZULWLQJ YDULDELOLW\ DQG VSDFH EHWZHHQFKDUDFWHUVLQDZRUG 7DEOH7KHGDWDVHWXVHGIRUWKHH[SHULPHQW Bangla

Devanagari

IAM(English)

Training

6824

6214

6029

Validation

854

810

822

Testing Keywords

914 671

878 621

875 821

)LJ:RUG6SRWWLQJ3HUIRUPDQFHWDNLQJ D ORFDOWKUHVKROGDQG E  JOREDOWKUHVKROG XVLQJIRUHJURXQGLQIRUPDWLRQRQO\ 

Table 2.Some examples of keywords Bangla Devanagari IAM (English) being Ċđñđþđĉ ͨ›ȡ• ĺĄĒĉíĠđĊđ

”ǐš™Ȫ‡“ȡ

House

ćþăđÿŪïƟ

—ȯ‡“ȯ

Government

ćĎđĂñĉē

ͧž€ȡ™

People

ċċēïĉ

\ͬ’€š

would

čáąđĀĀđþđ

‚Ȫ›ȡ–ȡšȣ

Should

$ FRPSDUDWLYH HYDOXDWLRQ LV VKRZQ LQ )LJ IRU FRPELQDWLRQ RI IRUHJURXQG DQG EDFNJURXQG LQIRUPDWLRQ ZLWKRQO\IRUHJURXQGDQGRQO\EDFNJURXQGLQIRUPDWLRQ,W KDVEHHQREVHUYHGWKDWWKHUHLVDVLJQLILFDQWLPSURYHPHQW LQ WKH ZRUG VSRWWLQJ SHUIRUPDQFH XVLQJ RXU FRPELQHG IHDWXUHH[WUDFWLRQPHWKRG

)LJ &RPSDUDWLYH VWXG\ RI ZRUG VSRWWLQJ SHUIRUPDQFH RQ D  %DQJOD E  'HYDQDJDUL DQG XVLQJ FRQFDWHQDWLRQ RI IRUHJURXQG DQG EDFNJURXQG LQIRUPDWLRQ ZLWK IRUHJURXQG RU EDFNJURXQG LQIRUPDWLRQDORQHXVLQJJOREDOWKUHVKROG

)LJ 4XDOLWDWLYH UHVXOWV RI IHZ ZRUG VSRWWLQJ LQVWDQFHV ZKHUH VSRWWHGZRUGVDUH LQGLFDWHG E\ UHG ER[HVDQGUHVXOWV DUH JLYHQ E\ FRUUHFW E\WLFN DQGLQFRUUHFW E\FURVV ODEHOV

$OVRZHKDYHFKHFNHGWKHUHVXOWXVLQJGLIIHUHQWQXPEHU RI NH\ZRUGV LQ RXU GDWDVHW FRQVLGHULQJ JOREDO WKUHVKROG XVLQJ FRQFDWHQDWLRQ RI IRUHJURXQG DQG EDFNJURXQG IHDWXUHV 7KH UHVXOWV DUH VKRZQ LQ )LJ  EDVHG RQ WKH SUHFLVLRQUHFDOOFXUYH

:H KDYH PHDVXUHG WKH SHUIRUPDQFH RI RXU ZRUG VSRWWLQJV\VWHP XVLQJSUHFLVLRQUHFDOODQG PHDQDYHUDJH SUHFLVLRQ 0$3 7KHSUHFLVLRQDQGUHFDOODUHGHILQHGDV IROORZV ܶܲ ܶܲ ܴ݈݈݁ܿܽ ൌ  ܲ‫ ݊݋݅ݏ݅ܿ݁ݎ‬ൌ ܶܲ ൅ ‫ܲܨ‬ ܶܲ ൅ ‫ܰܨ‬ :KHUH73LVWUXHSRVLWLYH)1LVIDOVHQHJDWLYHDQG)3 LVIDOVHSRVLWLYH0$3YDOXHLVHYDOXDWHGE\WKHDUHDXQGHU WKHFXUYHRIUHFDOODQGSUHFLVLRQ )RU RXU H[SHULPHQW  *DXVVLDQ PL[WXUH DQG  VWDWHV SURYLGHG RSWLPXP UHVXOWV :H KDYH HYDOXDWHG WKH SHUIRUPDQFHIRUZRUGVSRWWLQJFRQVLGHULQJORFDOWKUHVKROG DQG JOREDO WKUHVKROG ERWK LQ )LJ  )RU ORFDO WKUHVKROG VLQJOHLPDJHKDVEHHQFRQVLGHUHGIRURSWLPL]DWLRQRIWKH WKUHVKROG YDOXH ZKHUHDV D VWDQGDUG YDOXH KDV EHHQ XVHG IRU DOO TXHU\ NH\ZRUGV LQ FDVH RI JOREDO WKUHVKROG :H KDYH FRQVLGHUHG D WRWDO RI  NH\ZRUGV IRU RXU ZRUG VSRWWLQJ SHUIRUPDQFH HYDOXDWLRQ LQ %DQJOD DQG 'HYDQDJDULVFULSWV

)LJ &RPSDUDWLYH VWXG\ RI ZRUG VSRWWLQJ SHUIRUPDQFH RQ D  %DQJOD DQG E  'HYDQDJDUL VFULSWV ZLWK GLIIHUHQW QXPEHU RI NH\ZRUGV 429

$ FRPSDUDWLYH VWXG\ RI  GLIIHUHQW IHDWXUHV QDPHO\ /*+ DQG 3+2* LV DOVR SHUIRUPHG WR FKHFN WKH HIILFLHQF\ RI RXU IHDWXUH H[WUDFWLRQ DSSURDFK E\ FRQFDWHQDWLQJ IRUHJURXQG DQG EDFNJURXQG LQIRUPDWLRQ/*+ IHDWXUH >@ ZKLFK ZDV IRXQG WR EH XVHIXOLQ/DWLQWH[WDOVRJLYHVDFORVHDFFXUDF\WR3+2* IHDWXUH ,Q /*+ IHDWXUHV ZLWK  ELQV ZH IRXQG  GLPHQVLRQDO IHDWXUH YHFWRU IRU HDFK VOLGLQJ ZLQGRZ SRVLWLRQ :RUG VSRWWLQJ SHUIRUPDQFH LV VOLJKWO\ IRXQG WR EH EHWWHU XVLQJ 3+2* IHDWXUH WKDQ /*+ IHDWXUH 0$3 YDOXH KDV EHHQ JLYHQ LQ 7DEOH  2XU SURSRVHG ZRUG VSRWWLQJ IUDPHZRUN KDV EHHQ WHVWHGRQWKH ,$0 GDWDVHW RI (QJOLVK VFULSW $ GHWDLOHG DQDO\VLV RI WKH UHVXOWV RQ ,$0 GDWDVHWLVVKRZQLQ7DEOH

JDS EHWZHHQ WZR FRQVHFXWLYH ZRUGV LV QRW UHJXODU ,Q IXWXUHZHVKDOOZRUNRQWLPHHIILFLHQWDSSURDFKIRUZRUG VSRWWLQJ 5HIHUHQFHV >@ $ 9LQFLDUHOOL ³$ VXUYH\ RQ RIIOLQH FXUVLYH ZRUG UHFRJQLWLRQ´ 3DWWHUQ5HFRJQLWLRQ9RO  SS± >@ 6:VKDK*.XPDUDQG9*RYLQGDUDMX³6FULSWLQGHSHQGHQWZRUG VSRWWLQJLQ RIIOLQH KDQGZULWWHQGRFXPHQWVEDVHG RQKLGGHQ PDUNRY PRGHOV´,Q3URF,&)+5SS >@ 7 0 5DWK DQG 5 0DQPDWKD ³:RUG VSRWWLQJ IRU KLVWRULFDO GRFXPHQWV´ ,QWHUQDWLRQDO -RXUQDO RI 'RFXPHQW $QDO\VLV DQG 5HFRJQLWLRQ9RO  SS± >@ @ $ )LVFKHU HW DO ³/H[LFRQIUHH KDQGZULWWHQ ZRUG VSRWWLQJ XVLQJ FKDUDFWHU +00V´ 3DWWHUQ 5HFRJQLWLRQ /HWWHUV 9RO    SS ± >@ 3 3 5R\ 8 3DO DQG - /ODGyV ³0RUSKRORJ\ %DVHG +DQGZULWWHQ /LQH 6HJPHQWDWLRQ 8VLQJ )RUHJURXQG DQG %DFNJURXQG ,QIRUPDWLRQ´,Q3URF,&)+5SS >@ $.%KXQLD$'DV335R\DQG83DO³$&RPSDUDWLYH6WXG\ RI )HDWXUHV IRU +DQGZULWWHQ %DQJOD 7H[W 5HFRJQLWLRQ´ ,Q 3URF ,&'$5 >@ 6 7KRPDV & & / +HXWWH DQG 7 3DTXHW ³$Q ,QIRUPDWLRQ ([WUDFWLRQ 0RGHO IRU 8QFRQVWUDLQHG +DQGZULWWHQ 'RFXPHQWV´ ,Q3URF,&35SS± >@ 9)ULQNHQ$)LVFKHU50DQPDWKDDQG+%XQNH³$1RYHO:RUG 6SRWWLQJ 0HWKRG %DVHG RQ 5HFXUUHQW 1HXUDO 1HWZRUNV´,((( 7UDQV3DWWHUQ$QDO0DFK,QWHOO9RO  SS >@ 05XVLQROHWDO³%URZVLQJKHWHURJHQHRXVGRFXPHQWFROOHFWLRQVE\ DVHJPHQWDWLRQIUHHZRUGVSRWWLQJPHWKRG´,Q3URF,&'$5SS±  >@ 335R\83DODQG-/ODGyV7H[W/LQH([WUDFWLRQLQ*UDSKLFDO 'RFXPHQWV XVLQJ %DFNJURXQG DQG )RUHJURXQG ,QIRUPDWLRQ ,QWHUQDWLRQDO -RXUQDO RI 'RFXPHQW $QDO\VLV DQG 5HFRJQLWLRQ 9RO   SS >@ @ 3 3 5R\-@ 6@ - 5 6HUUDQR DQG ) 3HUURQQLQ ³+DQGZULWWHQ ZRUGVSRWWLQJ XVLQJ KLGGHQ 0DUNRY PRGHOV DQG XQLYHUVDO YRFDEXODULHV´ 3DWWHUQ 5HFRJQLWLRQ9RO  SS

7DEOH&RPSDULVRQRI/*+DQG3+2*IHDWXUHVLQZRUGVSRWWLQJ XVLQJJOREDOWKUHVKROG Script

LGH

PHOG

Bangla

51.84

52.64

Devanagari

52.55

53.71

IAM (English)

48.04

48.98

7DEOH$QDO\VLVLQ,$0GDWDVHWIRU(QJOLVK Feature

MAP (Local)

MAP (Global)

Foreground Information

69.58

48.98

Background Information

44.28

32.19

Foreground + Background

72.28

51.87

 :HKDYHDOVRFKHFNHGWKHSHUIRUPDQFHXVLQJNH\ZRUG RIGLIIHUHQWOHQJWKVFRQVLGHULQJJOREDOWKUHVKROGZKLFKLV VKRZQLQ)LJ7KHFRPSXWDWLRQWLPHIRUWKHRFFXUUHQFH RIJLYHQNH\ZRUGLQD SDUWLFXODUWH[WOLQHLVVHFRQGV IRU%DQJODDQGVHFRQGVIRU'HYDQDJDULXVLQJ,QWHO 5  3HQWLXP 5 &38 *+] DQG*%5$0

)LJ:RUG VSRWWLQJ SHUIRUPDQFH XVLQJ NH\ZRUGV RI GLIIHUHQW OHQJWK

 &RQFOXVLRQDQGIXWXUHZRUN ,Q WKLV SDSHU ZH KDYH SURSRVHG D QRYHO IHDWXUH H[WUDFWLRQPHWKRGFRPELQLQJIRUHJURXQGDQGEDFNJURXQG IHDWXUHV IRU ZRUG VSRWWLQJ :H QRWHG WKDW 3+2*IHDWXUH RXWSHUIRUPHG WKH /*+ IHDWXUH IRU ZRUG VSRWWLQJ SHUIRUPDQFH /LQH OHYHO ZRUG VSRWWLQJ JLYHV EHWWHU SHUIRUPDQFH WKDQ ZRUG VHJPHQWLQJ DSSURDFK ZKHUH WKH 430