| Modifier and Type | Method and Description | 
|---|---|
scala.Tuple2<int[],double[]>[] | 
describeTopics(int maxTermsPerTopic)
Return the topics described by weighted terms. 
 | 
Vector | 
docConcentration()
Concentration parameter (commonly named "alpha") for the prior placed on documents'
 distributions over topics ("theta"). 
 | 
JavaRDD<scala.Tuple3<Long,int[],int[]>> | 
javaTopicAssignments()
Java-friendly version of  
topicAssignments | 
JavaPairRDD<Long,Vector> | 
javaTopicDistributions()
Java-friendly version of  
topicDistributions | 
JavaRDD<scala.Tuple3<Long,int[],double[]>> | 
javaTopTopicsPerDocument(int k)
Java-friendly version of  
topTopicsPerDocument | 
int | 
k()
Number of topics 
 | 
static DistributedLDAModel | 
load(SparkContext sc,
    String path)  | 
double | 
logLikelihood()
Log likelihood of the observed tokens in the training set,
 given the current parameter estimates:
  log P(docs | topics, topic distributions for docs, alpha, eta) 
 | 
double | 
logPrior()
Log probability of the current parameter estimate:
 log P(topics, topic distributions for docs | alpha, eta) 
 | 
void | 
save(SparkContext sc,
    String path)
Save this model to the given path. 
 | 
LocalLDAModel | 
toLocal()
Convert model to a local model. 
 | 
scala.Tuple2<long[],double[]>[] | 
topDocumentsPerTopic(int maxDocumentsPerTopic)
Return the top documents for each topic 
 | 
RDD<scala.Tuple3<Object,int[],int[]>> | 
topicAssignments()
Return the top topic for each (doc, term) pair. 
 | 
double | 
topicConcentration()
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
 distributions over terms. 
 | 
RDD<scala.Tuple2<Object,Vector>> | 
topicDistributions()
For each document in the training set, return the distribution over topics for that document
 ("theta_doc"). 
 | 
Matrix | 
topicsMatrix()
Inferred topics, where each topic is represented by a distribution over terms. 
 | 
RDD<scala.Tuple3<Object,int[],double[]>> | 
topTopicsPerDocument(int k)
For each document, return the top k weighted topics for that document and their weights. 
 | 
int | 
vocabSize()
Vocabulary size (number of terms or terms in the vocabulary) 
 | 
describeTopicspublic static DistributedLDAModel load(SparkContext sc, String path)
public int k()
LDAModelpublic int vocabSize()
LDAModelpublic Vector docConcentration()
LDAModelThis is the parameter to a Dirichlet distribution.
docConcentration in class LDAModelpublic double topicConcentration()
LDAModelThis is the parameter to a symmetric Dirichlet distribution.
topicConcentration in class LDAModelpublic LocalLDAModel toLocal()
public Matrix topicsMatrix()
WARNING: This matrix is collected from an RDD. Beware memory usage when vocabSize, k are large.
topicsMatrix in class LDAModelpublic scala.Tuple2<int[],double[]>[] describeTopics(int maxTermsPerTopic)
LDAModeldescribeTopics in class LDAModelmaxTermsPerTopic - Maximum number of terms to collect for each topic.public scala.Tuple2<long[],double[]>[] topDocumentsPerTopic(int maxDocumentsPerTopic)
maxDocumentsPerTopic - Maximum number of documents to collect for each topic.public RDD<scala.Tuple3<Object,int[],int[]>> topicAssignments()
public JavaRDD<scala.Tuple3<Long,int[],int[]>> javaTopicAssignments()
topicAssignmentspublic double logLikelihood()
 Note:
  - This excludes the prior; for that, use logPrior.
  - Even with logPrior, this is NOT the same as the data log likelihood given the
    hyperparameters.
public double logPrior()
public RDD<scala.Tuple2<Object,Vector>> topicDistributions()
public JavaPairRDD<Long,Vector> javaTopicDistributions()
topicDistributionspublic RDD<scala.Tuple3<Object,int[],double[]>> topTopicsPerDocument(int k)
k - (undocumented)public JavaRDD<scala.Tuple3<Long,int[],double[]>> javaTopTopicsPerDocument(int k)
topTopicsPerDocumentk - (undocumented)public void save(SparkContext sc, String path)
SaveableThis saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/
 The model may be loaded using Loader.load.
 
sc - Spark context used to save model data.path - Path specifying the directory in which to save this model.
              If the directory already exists, this method throws an exception.