Standards-based analyzers implemented with JFlex.
The org.apache.lucene.analysis.standard package contains three
    fast grammar-based tokenizers constructed with JFlex:
StandardTokenizer:
        as of Lucene 3.1, implements the Word Break rules from the Unicode Text 
        Segmentation algorithm, as specified in 
        Unicode Standard Annex #29.
        Unlike UAX29URLEmailTokenizer, URLs and email addresses are
        not tokenized as single tokens, but are instead split up into 
        tokens according to the UAX#29 word break rules.
        StandardAnalyzer includes
        StandardTokenizer, 
        StandardFilter, 
        LowerCaseFilter
        and StopFilter.
        When the Version specified in the constructor is lower than 
        3.1, the ClassicTokenizer
        implementation is invoked.ClassicTokenizer:
        this class was formerly (prior to Lucene 3.1) named 
        StandardTokenizer.  (Its tokenization rules are not
        based on the Unicode Text Segmentation algorithm.)
        ClassicAnalyzer includes
        ClassicTokenizer,
        StandardFilter, 
        LowerCaseFilter
        and StopFilter.
    UAX29URLEmailTokenizer: 
        implements the Word Break rules from the Unicode Text Segmentation
        algorithm, as specified in 
        Unicode Standard Annex #29.
        URLs and email addresses are also tokenized according to the relevant RFCs.
        UAX29URLEmailAnalyzer includes
        UAX29URLEmailTokenizer,
        StandardFilter,
        LowerCaseFilter
        and StopFilter.