Text Lab 1 3 9 – A Text Transformation Toolkit

Posted on  by

By Steven Black

  1. Text Lab 1 3 9 – A Text Transformation Toolkit Download
  2. Text Lab 1 3 9 – A Text Transformation Toolkit Free

Introduction

Combilex GA is an ASCII text file, one entry-per-line, which is easily adaptable for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems. Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules. PDF Web Service. ReportLab have over 20 years experience creating PDF web services using a variety of technologies including JSON and XML.Over the years we've evolved a simple way to give companies a document-generation service: you create a packet of data in json format, and post it to a web URL that converts it to a PDF. In this lab you will learn how to convert data between analog and digital, and the many pitfalls in doing so. Several LabVIEW programs are mentioned in this lab writeup. Many of these programs can be downloaded from the 111labBSC Share on the U: Drive from the 111-Lab computers. The graph of latexh/latex has transformed latexf/latex in two ways: latexfleft(x+1right)/latex is a change on the inside of the function, giving a horizontal shift left by 1, and the subtraction by 3 in latexfleft(x+1right)-3/latex is a change to the outside of the function, giving a vertical shift down by 3. Sales Joe’s Auto Repair is having a Buy 3 Get 1 Free tyre sale until the end of July. Visit Joe’s Auto Repair today on 21st Street for sizes and prices. Loyalty Programme Receive discounts on tasty lunch specials. Text “LUNCH” to 777-343-555 and get a 2% discount on your next order! Your Daily Food.

This article serves to introduce, illustrate, and explore some of the great ( and not so great ) string handling capabilities of Visual FoxPro.

I always seem to be involved with solving many text-data related problems in my VFP projects. On the surface, handling text isnt very sexy and seemingly not very interesting. I think otherwise, and I hope youll agree.

This document is split into three sections: Inbound is about getting text into the VFP environment so you can work with it. Processing is about manipulating the text, and Outbound is about sending text on its way when youre done.

To illustrate text handling in VFP, I am using the complete text of Tolstoys War And Peace, included on the conference CD as WarAndPeace.TXT, which along with thousands of works of literature, are available on the web, including here among others.

This article was originally written using Visual FoxPro version 6, and has since been updated for VFP 7 and VFP 8.

Some facts about VFP strings

Here are a few things you need to know about VFP strings:

In functional terms, there is no difference between a character field and a memo field. All functions that work on characters also work on memos.

The maximum number of characters that VFP can handle in a string is 16, 777, 184.

Inbound

This section is all about getting text into your processing environment.

Inbound text from table fields

To retrieve text from a table field, simply assign it to a memory variable.

Inbound from text files

There are many ways to retrieve text from files on disk.

FILETOSTR( cFileName ) is used to place the contents of a disk file into a string memory variable. This is among my favorite new functions in VFP 6. Its both useful and fast. For example, the following code executes in one-seventh of a second on my 220Mhz Pentium laptop.

In other words, on a very modest laptop ( by todays standards ) VFP can load the full text from Tolstoys War And Peace in one-seventh of a second.

Low Level File Functions ( LLFF ) are somewhat more cumbersome but offer great control. LLFF are also very fast. The following example reads the entire contents of Tolstoys War And Peace from disk into memory:

Given the similar execution times, I think we can conclude that internally, LLFF and FILETOSTR() are implemented similarly. However with the LLFF we also have fine control. For example, FGETS() allows us to read a line at a time. To illustrate, the following code reads the first 15 lines of War And Peace into array wpLines.

We can also retrieve a segment from War And Peace. FSEEK() moves the LLFF pointer, and the FREAD() function is used to read a range. Lets read, say, 1000 bytes about half way through the book.

Inbound from text files, with pre-processing

Sometimes you need to pre-process text before it is usable. For example, you may have an HTML file from which you need to clean and remove tags. Or maybe you have the problem exhibited by our copy of War and Peace, which has embedded hard-returns at the end of each line. How can we create a streaming document that we can actually format?

Often the answer is to use the APPEND FROM command, which imports from file into a table, and moreover supports a large variety of file formats. The strategy always works something like this: You create a single-field table, and you use APPEND FROM ... TYPE SDF to load it

Now youre good to go: Youve got a table of records that you can manipulate and transform to your hearts content using VFPs vast collection of functions.

Processing

This section discusses a wide variety of string manipulation techniques in Visual FoxPro. Lets say weve got some text in our environment, now lets muck with it.

Does a sub-string exist?

There are many ways to determine if a sub-string exists in a string. The $ command returns True or False if a sub-string is contained in a string. This command is fast. Try this:

The AT()and ATC()functions are also great for determining if a sub-string exists, the former having the advantage of being case insensitive and, moreover, their return values gives you an exact position of the sub-string.

The OCCURS() function will also tell you if a sub-string exists, and moreover tell you how many times the sub-string occurs. This code will count the number of occurrences of a variety of sub-strings in War And Peace.

Locating sub-strings in strings is something VFP does really well.

Locating sub-strings

One of the basic tasks in almost any string manipulation is locating sub strings within larger strings. Four useful functions for this are AT(), RAT(), ATC(), and RATC(). These locate the ordinal position of sub-strings locating from the left ( AT() ), from the right ( RAT() ), both of which have case-insensitive variants ( ATC(), and RATC() ). All these functions are very fast and scale well with file size. For example, lets go look for THE END in War And Peace.

You can also check for the nth occurrence of a sub-string, as illustrated below where we find the 1st, 101st, 201st...701st occurrence of the word Russia in War And Peace.

Two other functions are useful for locating strings: ATLINE() and ATCLINE(). These return the line number of the first occurrence of a string.

Note: Prior to VFP 7, functions that are sensitive to SET MEMOWIDTH, like ATLINE() and ATCLINE(), among others, are dog-slow on larger strings and so do not scale well at all.

Traversing text line-by-line

Iterating through text, one line at a time, is a common task. Heres the way VFP developers have been doing it for years: Using the MEMLINES() and MLINE() functions. Like this:

Thats pathetic performance. 20+ seconds to iterate through 767 lines! Fortunately, theres a trick to using MLINE(), which is to pass the _MLINE system memory variable as the third parameter. Like this.

Now thats more like it a fifty-fold improvement. A surprising number of VFP developers dont know this idiom with _MLINE even though its been documented in the FoxPro help since version 2 at least.

Starting in VFP 6 all this is obsolete, since ALINES() is a screaming new addition to the language. Lets see how these routines look and perform with ALINES().

Another twenty-fold improvement in speed. I think the lesion is clear: If you are using MLINE() in your applications, and you are using VFP 6, then its time to switch to ALINES(). There are just two major differences: First, ALINES() is limited by VFPs 65, 000 array element limit, and second, successive lines with only CHR( 13 ) carriage returns are considered as one line. For example:

But if you use carriage return + line feed, CHR( 13 )+CHR( 10 ), youll get the results you expect.

This is a bit unnerving if blank lines are important, so beware and use CHR( 13 )+CHR( 10 ) to avoid this problem.

Now, just for fun, lets rip through War And Peace using ALINES().

Excuse me, but wow, considering were creating a 54, 337 element array from a file on disk, then were traversing the entire array assigning each elements contents to a memory variable, and were back in 3.4 seconds.

What about just creating the array of War And Peace:

So, on my Pentium 233 laptop using VFP 6, we can load War and Peace from disk into a 54, 000-item array in 2.2 seconds. On my newer desktop machine, a Pentium 500, this task is subsecond.

Traversing text word-by-word

You could recursively traverse a string word-by-word by using, among other things, the return value from AT( , x, n )and SUBS( , , ) and, if you are doing that, youre missing a great and little known feature of VFP.

Two new functions are great for word-by-word text processing. The GETWORDCOUNT() and GETWORDNUM() functions, return the number of words and individual words respectively.

Prior to VFP 7, use the Words() and WordNum() functions, which are available to you when you load the FoxTools.FLL library, return the number of words and individual words respectively.

Lets see how they perform. Lets first count the words in War And Peace.

The GETWORDCOUNT() function is also useful for counting all sorts of tokens since you can pass the word delimiters in the second parameter. How many sentences are there in War And Peace?

GETWORDNUM() returns a specific word from a string. Whats the 666th word in War And Peace? What about the 500000th?

Similarly to GETWORDCOUNT(), we can use GETWORDNUM() to return a token from a string by specifying the delimiter. Whats the 2000th sentence in War And Peace?

Substituting text

VFP has a number of useful functions for substituting text. STRTRAN(), CHRTRAN(), CHRTRANC(), STUFF(), and STUFFC().

STRTRAN() replaces occurrences of a string with another. For example, lets change all occurrences of Anna to the McBride twins in War And Peace.

Thats over 125 replacements per second, which is phenomenal. What about removing strings?

So it appears that STRTRAN() both adds and removes strings with equal aplomb. What of CHRTRAN(), which swaps characters? Lets, say, change all s to ch in War and Peace.

Which isnt bad considering that there are 159, 218 occurrences of character s in War And Peace.

However dont try to use CHRTRAN() when the second parameter is an empty string. The performance of CHRTRAN() in these circumstances is terrible. If you need to suppress sub-strings, use STRTRAN() instead.

String Concatenation

VFP has tremendous concatenation speed if you use it in a particular way. Since many common tasks, like building web pages, involve building documents one element at a time, you should know that string expressions of the form x = x+y are very fast in VFP. Consider this:

The same type of performance applies if you build strings small chunks at a time, which is a typical scenario in dynamic Web pages whether a template engine or raw output is used. For example:

This full optimization occurs as long as the string is adding something to itself and as long as the string concatenated is stored in a variable. Using class properties is somewhat less efficient. String optimization does not occur if the first expression on the right of the = sign is not the same as the string being concatenated. So:

is not optimized in this fashion. The above line, placed in the example above, takes 25 seconds! So appending strings to strings is blazingly fast in most common situations.

Outputting text

So you've got text, maybe a lot of it, what are your options for writing it to disk.

Foremostly theres the new STRTOFILE() function which creates a disk file wit the contents of a string. Lets write War And Peace to disk.

Which means that you can dish 3+ Mb to disk in about a half-second.

You can also use Low Level File Functions ( LLFF ) to output text. The FWRITE() function dumps all or part of a string to disk. The FPUTS() function outputs a single line from the string, and moves the pointer

Here again, the similar performance times between FWRITE() and STRTOFILE() are striking, just as they were when comparing FREAD() and FILETOSTR().

Heres an example of outputting War And Peace line-by-line using FPUTS(). Since were using ALINES(), its not that onerous a task. In fact, its very slick!

Conclusion

So, there you have it, a cafeteria-style tour of VFPs text handling capabilities. I personally think that most of the code snippets Ive shown here have amazing and borderline unbelievable execution speeds. I hope Ive been able to show that VFP really excels at string handling.

  • 1Full system
    • 1.1Multilingual
    • 1.2Language specific
  • 2Front end (NLP part)
    • 2.1Front end inc G2P
    • 2.2Text normalization
    • 2.3Dictionary related tools
  • 3Backend (Acoustic part)
    • 3.2HMM based
    • 3.3DNN based
    • 3.4Wavenet based
  • 4End-to-end (text to audio)
  • 5Signal processing
    • 5.1Vocoder, Glottal modelling
    • 5.2Pitch extractor
    • 5.3Sample modelling
    • 5.4Toolkits
  • 6Singing synthesizer
  • 7Ebook reader
  • 8Various tools
  • 9Articulatory synthesizer
  • 10API/Library
  • 11Visualization & annotation tools
  • 12Resources
    • 12.1Dictionary

Full system

Multilingual

Festival

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Tools and documentation for build new voices are available through Carnegie Mellon's FestVox project

  • Last update: 2015/01/06
  • Link: http://www.cstr.ed.ac.uk/downloads/festival/2.4/
  • Reference:

FreeTTS

FreeTTS is a speech synthesis system written entirely in the JavaTM programming language. It isbased upon Flite: a small run-time speech synthesis engine developed at Carnegie MellonUniversity. Flite is derived from the Festival Speech Synthesis System from the University ofEdinburgh and the FestVox project from Carnegie Mellon University.

  • Last update: 2009-03-09
  • Link: http://freetts.sourceforge.net/docs/index.php
  • Reference:

MBROLA

The aim of the MBROLA project, initiated by the TCTS Lab of the Faculté Polytechnique de Mons(Belgium), is to obtain a set of diphone-based speech synthesizers for as many languages aspossible, and provide them free for non-commercial applications.

  • Last update:
  • Link: http://tcts.fpms.ac.be/synthesis/mbrola.html
  • Reference:

MARY

MARY is a multi-lingual (German, English, Tibetan) and multi-platform (Windows, Linux, MacOs X andSolaris) speech synthesis system. It comes with an easy-to-use installer - no technical expertiseshould be required for installation. It enables expressive speech synthesis, using both diphone andunit-selection synthesis.

  • Last update: 2017/09/26
  • Link: http://mary.dfki.de/
  • Reference:

AhoTTS

Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English.It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on htsengine and it uses a high quality vocoder called AhoCoder.

  • Last update: 2015/07/15
  • Link: https://sourceforge.net/projects/ahottsmultiling/

Language specific

AHOTTS (Basque & spanish)

Text-to-Speech conversor for Basque and Spanish. It includeslinguistic processing and built voices for the languagesaforementioned. Its acoustic engine is based on htsengine and it usesa high quality vocoder called AhoCoder.

  • Last update: 2016/04/07
  • Link: https://sourceforge.net/projects/ahotts
  • Link2: https://sourceforge.net/projects/ahottsiparrahotsa/ (for Lapurdian dialect of Basque.)
  • Reference:

RHVoice (Russian)

RHVoice is a free and open source speech synthesizer.

  • Last update: 2017/09/24
  • Link: https://github.com/Olga-Yakovleva/RHVoice

Front end (NLP part)

Front end inc G2P

SiRE

(Si)mply a (Re)search front-end for Text-To-Speech Synthesis.This is a research front-end for TTS. It is incomplete, inconsistent, badly coded and slow.But it is useful for me and should slowly develop into something useful to others.

  • Last update: 2016/10/11
  • Link: https://github.com/RasmusD/SiRe

Phonetisaurus

This repository contains scripts suitable for training, evaluating and using grapheme-to-phoneme models for speech recognition using the OpenFst framework. The current build requires OpenFst version 1.6.0 or later, and the examples below use version 1.6.2.

The repository includes C++ binaries suitable for training, compiling, and evaluating G2P models. It also some simple python bindings which may be used to extract individual multigram scores, alignments, and to dump the raw lattices in .fst format for each word.

  • Last update: 2017/09/17
  • Link: https://github.com/AdolfVonKleist/Phonetisaurus

Ossian

Ossian is a collection of Python code for building text-to-speech (TTS) systems, with an emphasis on easing research into building TTS systems with minimal expert supervision. Work on it started with funding from the EU FP7 Project Simple4All, and this repository contains a version which is considerable more up-to-date than that previously available. In particular, the original version of the toolkit relied on HTS to perform acoustic modelling. Although it is still possible to use HTS, it now supports the use of neural nets trained with the Merlin toolkit as duration and acoustic models. All comments and feedback about ways to improve it are very welcome.

  • Last update: 2017/09/15
  • Link: https://github.com/CSTR-Edinburgh/Ossian

SALB

The SALB system is a software framework for speech synthesis using HMM based voice models built by HTS (http://hts.sp.nitech.ac.jp/). See a more generic description on http://m-toman.github.io/SALB/.

The package currently includes:

A C++ framework that abstracts the backend functionality and provides a SAPI5 interface, a command line interface and a C++ API.

Backend functionality is provided by

  • an internal text analysis module for (Austrian) German,
  • flite as text analysis module for English and
  • htsengine for parameter generation/synthesis. (see COPYING for information on 3rd party libraries)

Also included is an Austrian German male voice model.

  • Last update: 2016/11/14
  • Link: https://github.com/m-toman/SALB

Sequence-to-Sequence G2P toolkit

The tool does Grapheme-to-Phoneme (G2P) conversion using recurrent neural network (RNN) with long short-term memory units (LSTM). LSTM sequence-to-sequence models were successfully applied in various tasks, including machine translation [1] and grapheme-to-phoneme [2].

This implementation is based on python TensorFlow, which allows an efficient training on both CPU and GPU.

  • Last update: 2017/03/28
  • Link: https://github.com/cmusphinx/g2p-seq2seq

Text normalization

Sparrowhawk

Sparrowhawk is an open-source implementation of Google's Kestrel text-to-speechtext normalization system. It follows the discussion of the Kestrel system asdescribed in:

Ebden, Peter and Sproat, Richard. 2015. The Kestrel TTS text normalizationsystem. Natural Language Engineering, Issue 03, pp 333-353.

After sentence segmentation (sentenceboundary.h), the individual sentences arefirst tokenized with each token being classified, and then passed to thenormalizer. The system can output as an unannotated string of words, and richerannotation with links between input tokens, their input string positions, andthe output words is also available.


  • Last update: 2017/07/25
  • Link: https://github.com/google/sparrowhawk

ASRT

This is the README for the Automatic Speech Recognition Tools.

This project contains various scripts in order to facilitate the preparation of ASR related tasks.

Current tasks ares:

  1. Sentences extraction from pdf files
  2. Sentences classification by langues
  3. Sentences filtering and cleaning

Document sentences can be extracted into single document or batch mode.

For an example on how to extract sentences in batch mode, please have a look at the rundatapreparationtask.sh script located in examples/bash directory.

For an example on how to extract sentences in single document mode, please have a look at the rundatapreparation.sh script located in examples/bash directory.

The is also an API to be used in python code. It is located into the common package and is called DataPreparationAPI.py

  • Last update: 2017/09/20
  • Link: https://github.com/idiap/asrt


IRISA text normalizer

Text normalisation tools from IRISA lab.

The tools provided here are split into 3 steps:

  1. Tokenisation (adding blanks around punctation marks, dealing with special cases like URLs, etc.)
  2. Generic normalization (leading to homogeneous texts where (almost) information have been lost and where tags have been added for some entities)
  3. Specific normalisation (projection of the generic texts into specific forms)
  • Last update: 2018/01/09
  • Link: https://github.com/glecorve/irisa-text-normalizer

Dictionary related tools

CMU Pronunciation Dictionary Tools

Tools for working with the CMU Pronunciation Dictionary

  • Last update: 2015/02/23
  • Link: https://github.com/cmusphinx/cmudict-tools

ISS scripts for dictionary maintenance

These scripts are sufficient to convert the distributed forms of dictionaries into forms useful for our tools (notably HTK and ISS). Once a dictionary is in a standard form, the generic tools in ISS can be used to manipulate it further.

  • Last update: 2017/07/04
  • Link: https://github.com/idiap/iss-dicts

Backend (Acoustic part)

Unit selection

HMM based

MAGE

MAGE is a C/C++ software toolkit for reactive implementation of HMM-based speech and singing synthesis.

  • Last update: 2014/07/18
  • Link: https://github.com/numediart/mage

HMM-Based Speech Synthesis System (HTS)

The basic core system of HTS, available from NITECH, was implemented as a modified version of HTKtogether with SPTK (see below), and is released as HMM-Based Speech Synthesis System (HTS) in a formof patch code to HTK.

  • Last update: 2016/12/25
  • Link: http://hts.sp.nitech.ac.jp/

HTS Engine

htsengine is a small run-time synthesis engine (less than 1 MB including acoustic models), whichcan run without the HTK library. The current version does not include any text analyzer but theFestival Speech Synthesis System can be used as a text analyzer.

  • Last update: 2015/12/25
  • Link: http://hts-engine.sourceforge.net/

DNN based

MERLIN

Merlin is a toolkit for building Deep Neural Network models for statistical parametric speech synthesis. It must be used in combination with a front-end text processor (e.g., Festival) and a vocoder (e.g., STRAIGHT or WORLD).

The system is written in Python and relies on the Theano numerical computation library.

Merlin comes with recipes (in the spirit of the Kaldi automatic speech recognition toolkit) to show you how to build state-of-the art systems.

  • Last update: 2017/09/29
  • Link: http://www.cstr.ed.ac.uk/projects/merlin
  • Reference:

IDLAK

Idlak is a project to build an end-to-end parametric TTSsystem within Kaldi, to be distributed with the same licence.

It contains a robust front-end, voice building tools, speech analysisutilities, and DNN tools suitable for parametric synthesis. It also containsan example of using Idlak as an end-to-end TTS system, in egs/ttsdnnarctic/s1

Note that the kaldi structure has been maintained and the tool buildingprocedure is identical.

  • Last update: 2017/07/03
  • Link: https://github.com/bpotard/idlak
  • Reference:

CURRENNT scripts

The scripts and examples on the modified CURRENNT toolkit

  • Last update: 2017/08/27

Text Lab 1 3 9 – A Text Transformation Toolkit Download

  • Link: https://github.com/TonyWangX/CURRENNT_SCRIPTS

Wavenet based

tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper

  • Last update: 2017/05/23
  • Link: https://github.com/ibab/tensorflow-wavenet

Other

End-to-end (text to audio)

barronalex/Tacotron

Implementation of Google's Tacotron in TensorFlow

  • Last update: 2017/08/08
  • Link: https://github.com/barronalex/Tacotron

keithito/tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model

  • Last update: 2017/11/06
  • Link: https://github.com/keithito/tacotron

Char2Wav: End-to-End Speech Synthesis

This repo has the code for our ICLR submission:

Text Lab 1 3 9 – A Text Transformation Toolkit Free

Jose Sotelo, Soroush Mehri, Kundan Kumar, João Felipe Santos, Kyle Kastner, Aaron Courville, Yoshua Bengio. Char2Wav: End-to-End Speech Synthesis.

The website is here.

  • Last update: 2017/02/28
  • Link: https://github.com/sotelo/parrot
  • Reference:

Signal processing

Vocoder, Glottal modelling

STRAIGHT

STRAIGHT is a tool for manipulating voice quality, timbre, pitch, speed and other attributesflexibly. It is an always evolving system for attaining better sound quality, that is close to theoriginal natural speech, by introducing advanced signal processing algorithms and findings incomputational aspects of auditory processing.

STRAIGHT decomposes sounds into source information and resonator (filter) information. Thisconceptually simple decomposition makes it easy to conduct experiments on speech perception usingSTRAIGHT, the initial design objective of this tool, and to interpret experimental results in termsof huge body of classical studies.

  • Last update:
  • Link: http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html
  • Reference:

World

WORLD is free software for high-quality speech analysis, manipulation and synthesis. It can estimate Fundamental frequency (F0), aperiodicity and spectral envelope and also generate the speech like input speech with only estimated parameters.

This source code is released under the modified-BSD license. There is no patent in all algorithms in WORLD.

  • Last update: 2017/08/23
  • Link: https://github.com/mmorise/World
  • Reference:

Covarep - A Cooperative Voice Analysis Repository for Speech Technologies

Covarep is an open-source repository of advanced speech processing algorithmsand is stored as a GitHub project (https://github.com/covarep/covarep) whereresearchers in speech processing can store original implementations of publishedalgorithms.

Over the past few decades a vast array of advanced speech processing algorithmshave been developed, often offering significant improvements over the existingstate-of-the-art. Such algorithms can have a reasonably high degree ofcomplexity and, hence, can be difficult to accurately re-implement based onarticle descriptions. Another issue is the so-called 'bug magnet effect' withre-implementations frequently having significant differences from the originalones. The consequence of all this has been that many promising developmentshave been under-exploited or discarded, with researchers tending to stick toconventional analysis methods.

By developing Covarep we are hoping to address this by encouraging authors toinclude original implementations of their algorithms, thus resulting in asingle de facto version for the speech community to refer to.

  • Last update: 2016/10/16
  • Link: https://github.com/covarep/covarep
  • Reference:

MagPhase Vocoder

Speech analysis/synthesis system for TTS and related applications.

This software is based on the method described in the paper:

  1. Espic, C. Valentini-Botinhao, and S. King, “Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis,” in Proc. Interspeech, Stockholm, Sweden, August, 2017.
  1. Last update: 2017/08/30
  1. Link: https://github.com/CSTR-Edinburgh/magphase
  1. Reference:

WavGenSR

Waveform generator based on signal reshaping for statistical parametric speech synthesis.

  • Last update: 2017/08/30
  • Link: https://github.com/CSTR-Edinburgh/WavGenSR
  • Reference:

Pulse model analysis and synthesis

It is basically the vocoder described in:

  1. Degottex, P. Lanchantin, and M. Gales, 'A Pulse Model in Log-domain for a Uniform Synthesizer,' in Proc. 9th Speech Synthesis Workshop (SSW9), 2016.
  1. Last update: 2017/09/7
  1. Link: https://github.com/gillesdegottex/pulsemodel
  1. Reference:

YANG VOCODER: Yet-ANother-Generalized VOCODER

Yet another vocoder that is not STRAIGHT.

This project is a state-of-the-art vocoder that parameterizes the speech signalinto a parameterization that is amenable to statistical manipulation.

The VOCODER was developed by Hideki Kawahara during his internship at Google.

  • Last update: 2017/01/02
  • Link: https://github.com/google/yang_vocoder

Ahocoder

Ahocoder parameterizes speech waveforms into three different streams: log-f0, cepstral representation of the spectral envelope, and maximum voiced frequency. It provides high accuracy during analysis and high quality during reconstruction. It is adequate for statistical parametric speech synthesis and voice conversion. Furthermore, it can be used just for basic speech manipulation and transformation (pitch level and variance, speaking rate, vocal tract length…).

Ahocoder is reported to be a very good complement for HTS. The output files generated by Ahocoder contain float numbers without header, so they are fully compatible with the HTS demo scripts in the HTS website. You can use the same configuration as in the STRAIGHT-based demo, using the 'bap' stream to handle maximum voiced frequency (set its dimension to 1 both in data/Makefile and in scripts/Config.pm).

  • Last update: 2014
  • Link: http://aholab.ehu.es/ahocoder/

PhonVoc: Phonetic and Phonological vocoding

This is a computational platform for Phonetic and Phonologicalvocoding, released under the BSD licence. See file COPYING fordetails. The software is based on Kaldi (v. 489a1f5) and Idiap SSP.For training of the analysis and synthesis models, follow pleasetrain/README.txt.

  • Last update: 2016/11/23
  • Link: https://github.com/idiap/phonvoc

GlottGAN

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

  • Last update: 2017/05/30
  • Link: https://github.com/bajibabu/GlottGAN
  • Reference:

Postfilt gan

This is an implementation of 'Generative adversarial network-based postfilter for statistical parametric speech synthesis'

Please check the run.sh file to train the system. Currently, testing part is not yet implemented.

  • Last update: 2017/07/06
  • Link: https://github.com/bajibabu/postfilt_gna
  • Reference:

Pitch extractor

REAPER: Robust Epoch And Pitch EstimatoR

This is a speech processing system. The reaper program uses the EpochTracker class to simultaneously estimate the location of voiced-speech 'epochs' or glottal closure instants (GCI), voicing state (voiced or unvoiced) and fundamental frequency (F0 or 'pitch'). We define the local (instantaneous) F0 as the inverse of the time between successive GCI.

This code was developed by David Talkin at Google. This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.

  • Last update: 2015/03/04
  • Link: https://github.com/google/REAPER

SSP - Speech Signal Processing module

SSP is a package for doing signal processing in python; the functionality is biassed towards speech signals. Top level programs include a feature extracter for speech recognition, and a vocoder for both coding and speech synthesis. The vocoder is based on linear prediction, but with several experimental excitation models. A continuous pitch extraction algorithm is also provided, built around standard components and a Kalman filter.

There is a 'sister' package, libssp, that includes translations of some algorithms in C++. Libssp is built around libube that makes this translation easier.

SSP is released under a BSD licence. See the file COPYING for details.

  • Last update: 2017/04/16
  • Link: https://github.com/idiap/ssp

Sample modelling

SampleRNN

SampleRNN: An Unconditional End-to-End Neural Audio Generation Mode

  • Last update:
  • Link: https://github.com/soroushmehr/sampleRNN_ICLR2017

Toolkits

SPTK - Speech Signal Processing Toolkit

The main feature of the Speech Signal Processing Toolkit, available from NITECH, is that not onlystandard speech analysis and synthesis techniques (e.g., LPC analysis, PARCOR analysis, LSPanalysis, PARCOR synthesis filter, LSP synthesis filter, and vector quantization techniques) butalso speech analysis and synthesis techniques developed at the research group can easily be used.

  • Last update: 2016/12/25
  • Link: http://sp-tk.sourceforge.net/

Singing synthesizer

Sinsy

Sinsy is a HMM-based singing voice synthesis system.

Text Lab 1 3 9 – A Text Transformation Toolkit
  • Last update: 2015/12/25
  • Link: http://sinsy.sourceforge.net/

Ebook reader

Bard Storyteller ebook reader

Bard Storyteller is a text reader. Bard not only allows a user to read books, but can also read books to the user using text-to-speech. It supports txt, epub and (x)html files.

  • Last update: 2014/07
  • Link: http://festvox.org/bard/

Various tools

SparkNG

Matlab realtime speech tools and voice production tools

  • Last update: 2017/06/29
  • Link: http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/

Articulatory synthesizer

KLAIR - A virtual infant for spoken language acquisition research

The KLAIR project aims to build and develop a computational platform to assist research into the acquisition of spoken language. The main part of KLAIR is a sensori-motor server that displays a virtual infant on screen that can see, hear and speak. Behind the scenes, the server can talk to one or more client applications. Each client can monitor the audio visual input to the server and can send articulatory gestures to the head for it to speak through an articulatory synthesizer. Clients can also control the position of the head and the eyes as well as setting facial expressions. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction.

  • Last update:
  • Link: http://www.phon.ucl.ac.uk/project/klair/
  • Reference:

Vocaltractlab

VocalTractLab stands for 'Vocal Tract Laboratory' and is an interactive multimedial software tool to demonstrate the mechanism of speech production. It is meant to facilitate an intuitive understanding of speech production for students of phonetics and related disciplines.

The current versions of VocalTractLab are free of charge. Only a registration code, which you can request by email, will be necessary to activate the software. VocalTractLab is written for Windows operating systems (XP or higher), but a porting to Linux/Unix is conceivable for the future.

  • Last update: 2016
  • Link: http://www.vocaltractlab.de/

API/Library

Speech Tools

The Edinburgh Speech Tools Library is a collection of C++ class,functions and related programs for manipulating the sorts of objectsused in speech processing. It includes support for reading and writingwaveforms, parameter files (LPC, Ceptra, F0) in various formats andconverting between them. It also includes support for linguistic typeobjects and support for various label files and ngrams (withsmoothing).

In addition to the library a number of programs are included. Anintonation library which includes a pitch tracker, smoother andlabelling system (using the Tilt Labelling system), a classificationand regression tree (CART) building program called wagon. Also thereis growing support for various speech recognition classes such asdecoders and HMMs.

The Edinburgh Speech Tools Library is not an end in itself butdesigned to make the construction of other speech systems easy. It isfor example to provided the underlying classes in the Festival SpeechSynthesis System

The speech tools are currently distributed in full source form freefor unrestricted use.

  • Last update: 2015/01/06
  • Link: http://www.cstr.ed.ac.uk/projects/speech_tools/

ROOTS

Roots is an open source toolkit dedicated to annotated sequential data generation, management andprocessing. It is made of a core library and of a collection of utility scripts. A rich API isavailable in C++ and in Perl.

  • Last update: 2015/07/01
  • Link: http://roots-toolkit.gforge.inria.fr/
  • Reference:

Visualization & annotation tools

Praat

Praat is a system for doing phonetics by computer. The computer program Praat is a research,publication, and productivity tool for phoneticians. With it, you can analyse, synthesize, andmanipulate speech, and create high-quality pictures for your articles and thesis.

  • Last update:
  • Link: http://www.fon.hum.uva.nl/praat/
  • Reference:

KPE

KPE provides a graphical interface for the implementation of the Klatt 1980 formant synthesiser. Theinterface allows users to display and edit Klatt parameters using a graphical display which includesthe time-amplitude waveform of both the original speech and its synthetic copy, and some signalanalysis facilities.

  • Last update:
  • Link: http://www.speech.cs.cmu.edu/comp.speech/Section5/Synth/klatt.kpe80.html

Wavesurfer

WaveSurfer is a tool for doing speech analysis. The analysis features include formants and pitchextraction and real time spectrograms. The Wavesurfer tool built on top of the Snack speechvisualization module, is highly modular and extensible at several levels.

  • Last update:
  • Link: https://sourceforge.net/projects/wavesurfer/

Resources

Dictionary

Unisyn lexicon

The Unisyn lexicon is a master lexicon transcribed in keysymbols, a kind of metaphoneme which allows the encoding of multiple accents of English.

The lexicon is accompanied by a number of perl scripts which transform the base lexicon via phonological and allophonic rules, and other symbol changes, to produce output transcriptions in different accents. The rules can be applied to the whole lexicon, to produce an accent-specific lexicon, or to running text. Output can be displayed in keysymbols, SAMPA, or IPA.

The system uses a geographically-based accent hierarchy, with a tree structure describing countries, regions, towns and speakers; this hierarchy is used to specify the application of rules and other pronunciation features.

The lexicon system is customisable, and the documentation explains how to modify output by swtiching rules on and off, adding new rules or editing existing ones. The user can also add new nodes in the accent hierarchy (new accents or new speakers within an accent), or add new symbols.

A number of UK, US, Australian and New Zealand accents are included in the release.

The scripts run under unix, or Windows 98 (DOS), and use perl 5.6.0.

  • Last update:
  • Link: http://www.cstr.ed.ac.uk/projects/unisyn/

Combilex

Combilex GA is a keyword-based lexicon for the General American pronunciation.

The combilex contains c.145,000 entries, including the 20,000 most frequent words and contains a variety of linguistic information alongside detailed pronunciations, including many useful proper names.

Combilex GA is an ASCII text file, one entry-per-line, which is easily adaptable for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems.

Text

Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules.

  • Last update:
  • Link: https://licensing.edinburgh-innovations.ed.ac.uk/item.php?item=combilex-ga
  • Reference: