Poo Face
From The Math Club
(Redirected from A Toolkit for Document Classification & Authorship Attribution)
I just started working on a general extensible framework and toolkit for document classification, authorship attribution and general information analysis. It's called Poo Face and the code can be found on the Small Code page for now. More information to come.
Contents |
Target Features
Here's a sort-of organized list of things that it should be able to do for you. As far as handling visualization aspects of these things, I'm not sure how im going to handle this but I will probably immediately handle some common graph formats.
Document Scoring
- Implemented Methods
- N-Gram / Markov Chains
- Development Methods
- Sequitur
- Wanted Methods
- Naive Bayesian
- Sequitur
- I,Sushi
- LZ77
- Huffman Encoding
- Linguistic Collocation
Sequence Alignment
- Wanted Methods
- Needleman-Wunsch
- Seller's Algorithm
- Smith-Waterman
- Longest Common Sub-Sequence
- Longest Common Sub-String
Clustering Methods
- Wanted Methods
- Greedy Spanning Tree
- Dijkstra's SPF
- Nearest Neighbor Clustering
- Network Flow
- Clique Detection
Adaptive Classification Systems
- Wanted Methods
- Bayesian Training
- Support-Vector Algorithm
- Neural Network
- Decision Tree Modeling
Download
Download it from the Small Code page under Poo Face. Right now its acutally beyond simplistic and undocumented. This is going to change. This is a project that I will continue to support.

