I have the following problem: I converted a corpus into a dfm and this dfmm has some zero entries that I need to remove before fitting a LDA model. I would usually do as follows:
OutDfm <- dfm_trim(dfm(corpus, tolower = TRUE, remove = c(stopwords("english"), stopwords("german"), stopwords("french"), stopwords("italian")), remove_punct = TRUE, remove_numbers = TRUE, remove_separators = TRUE, stem = TRUE, verbose = TRUE), min_docfreq = 5)
Creating a dfm from a corpus input...
... lowercasing
... found 272,912 documents, 112,588 features
... removed 613 features
... stemming features (English)
, trimmed 27491 feature variants
... created a 272,912 x 84,515 sparse dfm
... complete.
Elapsed time: 78.7 seconds.
# remove zero-entries
raw.sum=apply(OutDfm,1,FUN=sum)
which(raw.sum == 0)
OutDfm = OutDfm[raw.sum!=0,]
However, when I try to perform the last operations I get: Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
hinting at the fact the the matrix is too large to be manipulated.
Is there anyone who has met and solved this issue before? Any alternative strategy to remove the 0 entries?
Thanks a lot!