Invitation for Request for Expressions of Interest    Status:Corrigendum

GOVERNMENT OF THE PEOPLE'S REPUBLIC OF BANGLADESH

Ministry/Division : Ministry of Post, Telecommunications & Information Technology
Agency : Bangladesh computer Council (BCC)
Procuring Entity Name : Executive Director, Bangladesh Computer Council
Procuring Entity Code : 3039
Procuring Entity District : Dhaka
Expression of Interest for Selection of : Consulting Firm (National)  (Lump-Sump)
Title Of Service : Bangla Syntactic Treebank Corpus with Processing Pipeline and Distribution Platform (SD 07)
EOI Ref. No. : 56.01.0000.029.32.29.19(Part-01)-280
Date : 08/07/2020

KEY INFORMATION

Procurement Sub-Method : Quality and Cost Based Selection(QCBS)

FUNDING INFORMATION

Budget and Source of Funds : Development Budget GOB
Development Partners : None

PARTICULAR INFORMATION

Project/Programme Name : Enhancement of Bangla Language in ICT through Research & Development Project (1st revised)
EOI Closing Date and Time : 26/07/2020 11:00 AM
Publication Date : 09/07/2020

INFORMATION FOR APPLICANT

Brief Description of Assignment : The following points describe the scope of Development of‘Bangla Syntactic Treebank Corpus with Processing Pipeline and Distribution Platform’ in brief: 1.The Bangla language corpus must be annotated electronic text withSyntactic Treebank features. 2. The corpus must be representativebalanced and cover all necessary all domains with rich metadata. Also, the corpus should support Zipf’s Law and should have acceptable TTR.Atlest33% of gold-sileverdata should be non-scripted and derived from oral corpus and represent the real-world data. 3. The size of the annotatedcorpus must beat least 10M gold standard and 90M silver standard.The gold and silver standard corpus must maintain the annotation pipeline including Tokenization, NER, PoS tagging, lemmatization, dependency parsing, Phrase detection, Coreference resolution features etc. The standard of annotationand proessing pipeline will be endorsed by BCC. 4. The gold standard must be manually annotated and cover IAA, silver standard will be defined by single human annotation. The annotation will be performed by UG student and validated by supervisor nominated by department.The total annotation activities will be performed bytwo departments jointly under a MoU with vendor; one from technology cluster (i.e.CSE, IIT, ICT, EEE, ECS), another from language cluster (i.e. Bangla, Linguistics, English). 5.The vendor also provide non-annotated 10 Billion raw token of running text to build word and sentence embeddings. 6. The vendor must develop some Word embedding models using atleast 10B token to make the Bangla corpus as real-word dataset including: Count Vectors, TF-IDF, Co-Occurrence Matrix, CBOW, Skip-Gram, Word2vec. The embeddings should follow the standard ofstate of the art technology likeELMo embedding, BERT, GP2, GPT3 embedding etc. 7.The vendor should develop an integrated (web based) system corpus collection andprocessing pipeline withdistribution platform including some modules like: a.Crowd-sourcing, Crawling b. Data cleaner, tokenizer, Lemmatizer, Parser, Chunker, PoS tagger, NER engine,Coreference resolution, Wordnet, Dictionary for public use c. Corpus distribution web platform with API, admin panel to manage corpus platform.All modules should have separateAPI so th so that user can use the modules as their requirement. The system Should have some text analystics features including word/phrase frequency, wordlists, N-grams, Concordance, KWIC, Collocation, all vector representation etc.Theweb based corpus distribution platformshould have capability to read, view, use, search, filter, sort, arrange, export, import, store and analyze the text, image and speech data. 8.The product in different phases of the software development will be tested by the team of nominated consultants. 9.All the firms should solve the intellectual property issues. 10.The firm must have to maintain the specified Standard which will be determined by the Procuring Entity and details will be given at RFP. 11.All deliverables of the component will be government properpty as per PPR and PPA. No secondor third party branding and ownership is allowed. The developed products convey logos and links ofICt division, BCC and Project only. EBLICT Project now invites eligible firms to indicate the interest for providing the services. Interested firms are invited to provide information indicating that they are qualified to perform the services as mentioned in the serial number 17: Experience, Resources and Delivery Capacity Required. This will require substantiation through submission of brochures and other documents describing similar assignments, experiences, availability of appropriate professional qualifications and experiences among applicant’s staffs, resources to carry out the assignment, financial capability, etc. A firm may associate with other firm to fulfill their qualifications. A shortlist of firms will be prepared upon evaluation of EOIs of the eligible firms and “Request for Proposal” documents will be issued in their favor. A firm will be selected using the Quality and Cost Based Selection (QCBS) method. It is expected that the services will commence on September, 2020.
Experience, Resources and Delivery Capacity Required : This is a national project with utmost importance towards achieving the Vision 2021: Digital Bangladesh. The firms must prove that they have solid technical background and operational strength to undertake and move this work forward without any hindrances. The firms must also have adequate technical ability, resources, and processes. As such, following are defined as minimum eligibility criteria: 1. The firm must have minimum 5years experience in Bangladesh about software and services; 2. The firm must have practical experience of developing NLP based Software/ Linguistics Big Data Processing (Text, image, Signal) Processing Software/ Linguistics software. 3. The firm having experience in annotatedcorpus developmentand MoU for Research with local universities will be added an advantage. 4. The firm must have a sufficient number (30), please check this number from RFP) of full time key personnel such as ML enginer, data scientist,software developers and Bangla Language Specialist (in high, medium and low levels) having experiences in developing high quality applications. 5. The firm must have the sufficient amount (500 Lakh) of liquid assets, i.e., working capital or credit line(s) supported by competent documents; 6. The firm must have update-audit-report of previous 5 years. The firm must have submitted yearly turnover report/documents; 7. The firm should have valid up to date trade license, income tax and VAT certificates; 8. JVCA is allowed as per PPA-2006& PPR-2008; But all the firms must be Bangladeshi firm.
Other Details (if applicable) : The REOI would be reviewed on the basis of the following: 1. Experience of the firm(s) in NLP/ Annotated Corpus development. 2. CV of Key professionals 3. Turnover of the Consulting firm 4. Other submitted document of the firm 5. History of litigation (if any) in courts or any arbitrations proceedings. Interested firms shall obtain further information from the Project Office from 10:00 AM to 4:30 PM in any working day before the closing date. Firms shall have to submit 2(Two) copies of EOIs and forwarding letter in a sealed envelope labeled with “Re-EOI for Development of Bangla Syntactic Treebank Corpus with Processing Pipeline and Distribution Platform” to the following address. More information on this project can be obtainedfromhttp://www.bcc.gov.bd/
Association with foreign firms is : Not Applicable
Eoi Detail Information
Ref No Phasing Of Services Location Start Date Completion Date
56.01.0000.029.32.29.19(Part-01)-280 NA Dhaka September 2020 June 2021

PROCURING ENTITY DETAILS

Name of Official Inviting EOI : Dr. Md. Ziauddin
Designation of Official Inviting EOI : Project Director, Enhancement of Bangla Language in ICT through Research & Development Project (ist Revised)
Address of Official Inviting EOI : 8th Floor, ICT Tower, E-14/X, Agargaon, Sher-e-Bangla Nagar, Dhaka-1207
Contact details of Official Inviting EOI : Phone : 55006880, Fax : 912462, Email : pdeblict@bcc.gov.bd

Advertisement Corrigendum(s)

Date Of Corrigendum

:

22/07/2020

Brief Description : The following points describe the scope of Development of‘Bangla Syntactic Treebank Corpus with Processing Pipeline and Distribution Platform’ in brief: 1.The Bangla language corpus must be annotated electronic text withSyntactic Treebank features. 2. The corpus must be representativebalanced and cover all necessary all domains with rich metadata. Also, the corpus should support Zipf’s Law and should have acceptable TTR.Atlest33% of gold-sileverdata should be non-scripted and derived from oral corpus and represent the real-world data. 3. The size of the annotatedcorpus must beat least 10M gold standard and 90M silver standard.The gold and silver standard corpus must maintain the annotation pipeline including Tokenization, NER, PoS tagging, lemmatization, dependency parsing, Phrase detection, Coreference resolution features etc. The standard of annotationand proessing pipeline will be endorsed by BCC. 4. The gold standard must be manually annotated and cover IAA, silver standard will be defined by single human annotation. The annotation will be performed by UG student and validated by supervisor nominated by department.The total annotation activities will be performed bytwo departments jointly under a MoU with vendor; one from technology cluster (i.e.CSE, IIT, ICT, EEE, ECS), another from language cluster (i.e. Bangla, Linguistics, English). 5.The vendor also provide non-annotated 10 Billion raw token of running text to build word and sentence embeddings. 6. The vendor must develop some Word embedding models using atleast 10B token to make the Bangla corpus as real-word dataset including: Count Vectors, TF-IDF, Co-Occurrence Matrix, CBOW, Skip-Gram, Word2vec. The embeddings should follow the standard ofstate of the art technology likeELMo embedding, BERT, GP2, GPT3 embedding etc. 7.The vendor should develop an integrated (web based) system corpus collection andprocessing pipeline withdistribution platform including some modules like: a.Crowd-sourcing, Crawling b. Data cleaner, tokenizer, Lemmatizer, Parser, Chunker, PoS tagger, NER engine,Coreference resolution, Wordnet, Dictionary for public use c. Corpus distribution web platform with API, admin panel to manage corpus platform.All modules should have separateAPI so th so that user can use the modules as their requirement. The system Should have some text analystics features including word/phrase frequency, wordlists, N-grams, Concordance, KWIC, Collocation, all vector representation etc.Theweb based corpus distribution platformshould have capability to read, view, use, search, filter, sort, arrange, export, import, store and analyze the text, image and speech data. 8.The product in different phases of the software development will be tested by the team of nominated consultants. 9.All the firms should solve the intellectual property issues. 10.The firm must have to maintain the specified Standard which will be determined by the Procuring Entity and details will be given at RFP. 11.All deliverables of the component will be government properpty as per PPR and PPA. No secondor third party branding and ownership is allowed. The developed products convey logos and links ofICt division, BCC and Project only. EBLICT Project now invites eligible firms to indicate the interest for providing the services. Interested firms are invited to provide information indicating that they are qualified to perform the services as mentioned in the serial number 17: Experience, Resources and Delivery Capacity Required. This will require substantiation through submission of brochures and other documents describing similar assignments, experiences, availability of appropriate professional qualifications and experiences among applicant’s staffs, resources to carry out the assignment, financial capability, etc. A firm may associate with other firm to fulfill their qualifications. A shortlist of firms will be prepared upon evaluation of EOIs of the eligible firms and “Request for Proposal” documents will be issued in their favor. A firm will be selected using the Quality and Cost Based Selection (QCBS) method. It is expected that the services will commence on September, 2020..
Experience, Resources and Delivery Capacity Required : This is a national project with utmost importance towards achieving the Vision 2021: Digital Bangladesh. The firms must prove that they have solid technical background and operational strength to undertake and move this work forward without any hindrances. The firms must also have adequate technical ability, resources, and processes. As such, following are defined as minimum eligibility criteria: 1. The firm must have minimum 5years experience in Bangladesh about software and services; 2. The firm must have practical experience of developing NLP based Software/ Linguistics Big Data Processing (Text, image, Signal) Processing Software/ Linguistics software. 3. The firm having experience in annotatedcorpus developmentand MoU for Research with local universities will be added an advantage. 4. The firm must have a sufficient number (30) of full time key personnel such as ML engineer, data scientist, software developers and Bangla Language Specialist (in high, medium and low levels) having experiences in developing high quality applications. 5. The firm must have the sufficient amount (500 Lakh) of liquid assets, i.e., working capital or credit line(s) supported by competent documents; 6. The firm must have update-audit-report of previous 5 years. The firm must have submitted yearly turnover report/documents; 7. The firm should have valid up to date trade license, income tax and VAT certificates; 8. JVCA is allowed as per PPA-2006& PPR-2008; But all the firms must be Bangladeshi firm.
Other Information : The REOI would be reviewed on the basis of the following: 1. Experience of the firm(s) in NLP/ Annotated Corpus development. 2. CV of Key professionals 3. Turnover of the Consulting firm 4. Other submitted document of the firm 5. History of litigation (if any) in courts or any arbitrations proceedings. Interested firms shall obtain further information from the Project Office from 10:00 AM to 4:30 PM in any working day before the closing date. Firms shall have to submit 2(Two) copies of EOIs and forwarding letter in a sealed envelope labeled with “Re-EOI for Development of Bangla Syntactic Treebank Corpus with Processing Pipeline and Distribution Platform” to the following address. More information on this project can be obtainedfromhttp://www.bcc.gov.bd/
Last Date Of Submission : 06/08/2020
Last Time Of Submission : 11:00 AM
Status : Active
The procuring entity reserves the right to accept or reject all tenders