Package: conText 2.0.0

conText: 'a la Carte' on Text (ConText) Embedding Regression

A fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) <arxiv:1805.05388> and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>.

Authors:Pedro L. Rodriguez [aut, cre, cph], Arthur Spirling [aut], Brandon Stewart [aut], Christopher Barrie [ctb]

conText_2.0.0.tar.gz
conText_2.0.0.zip(r-4.5)conText_2.0.0.zip(r-4.4)conText_2.0.0.zip(r-4.3)
conText_2.0.0.tgz(r-4.4-any)conText_2.0.0.tgz(r-4.3-any)
conText_2.0.0.tar.gz(r-4.5-noble)conText_2.0.0.tar.gz(r-4.4-noble)
conText_2.0.0.tgz(r-4.4-emscripten)conText_2.0.0.tgz(r-4.3-emscripten)
conText.pdf |conText.html
conText/json (API)

# Install 'conText' in R:
install.packages('conText', repos = c('https://prodriguezsosa.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/prodriguezsosa/context/issues

Datasets:

On CRAN:

29 exports 97 stars 4.25 score 58 dependencies 1.6k scripts 278 downloads

Last updated 5 months agofrom:c373ea228e. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKAug 20 2024
R-4.5-winNOTEAug 20 2024
R-4.5-linuxNOTEAug 20 2024
R-4.4-winNOTEAug 20 2024
R-4.4-macNOTEAug 20 2024
R-4.3-winNOTEAug 20 2024
R-4.3-macNOTEAug 20 2024

Exports:bootstrap_contrastbootstrap_nnscompute_contrastcompute_transformconTextcontrast_nnscos_simdemdem_groupdem_sampleembed_targetfeature_simfemfind_nnsget_contextget_cos_simget_grouped_similarityget_local_vocabget_ncsget_nnsget_nns_ratioget_seq_cos_simncsnnsnns_ratiopermute_contrastplot_nns_ratioprototypical_contexttokens_context

Dependencies:clicolorspacecpp11data.tabledigestdplyrfansifarverfastDummiesfastmatchfloatgenericsggplot2gluegtableisobandISOcodesjsonlitelabelinglatticelgrlifecyclemagrittrMASSMatrixMatrixExtramgcvmlapimunsellnlmepillarpkgconfigplyrpurrrquantedaR6RColorBrewerRcppRcppArmadilloreshape2RhpcBLASctlrlangrsparsescalesSnowballCstopwordsstringistringrtext2vectibbletidyrtidyselectutf8vctrsviridisLitewithrxml2yaml

Finite Sample Bias

Rendered fromfinite_sample_bias.Rmdusingknitr::rmarkdownon Aug 20 2024.

Last update: 2023-08-04
Started: 2023-08-04

Quick Start Guide

Rendered fromquickstart.Rmdusingknitr::rmarkdownon Aug 20 2024.

Last update: 2023-08-04
Started: 2021-03-06

Readme and manuals

Help Manual

Help pageTopics
Bootstrap similarity and ratio computationsbootstrap_contrast
Bootstrap nearest neighborsbootstrap_nns
Bootstrap OLSbootstrap_ols
Boostrap similarity vectorbootstrap_similarity
build a 'conText-class' objectbuild_conText
build a 'dem-class' objectbuild_dem
build a 'fem-class' objectbuild_fem
Compute similarity and similarity ratioscompute_contrast
Compute similarity vector (sub-function of bootstrap_similarity)compute_similarity
Compute transformation matrix Acompute_transform
Embedding regressionconText
Contrast nearest neighborscontrast_nns
Compute the cosine similarity between one or more ALC embeddings and a set of features.cos_sim
GloVe subsetcr_glove_subset
Congressional Record sample corpuscr_sample_corpus
Transformation matrixcr_transform
Build a document-embedding matrixdem
Average document-embeddings in a dem by a grouping variabledem_group
Randomly sample documents from a demdem_sample
Embed target using either: (a) a la carte OR (b) simple (untransformed) averaging of context embeddingsembed_target
Given two feature-embedding-matrices, compute "parallel" cosine similarities between overlapping features.feature_sim
Create an feature-embedding matrixfem
Find cosine similarities between target and candidate wordsfind_cos_sim
Return nearest neighbors based on cosine similarityfind_nns
Get context words (words within a symmetric window around the target word/phrase) sorrounding a user defined target.get_context
Given a tokenized corpus, compute the cosine similarities of the resulting ALC embeddings and a defined set of features.get_cos_sim
Get averaged similarity scores between target word(s) and one or two vectors of candidate words.get_grouped_similarity
Identify words common to a collection of texts and a set of pretrained embeddings.get_local_vocab
Given a set of tokenized contexts, find the top N nearest contexts.get_ncs
Given a tokenized corpus and a set of candidate neighbors, find the top N nearest neighbors.get_nns
Given a corpus and a binary grouping variable, computes the ratio of cosine similarities over the union of their respective N nearest neighbors.get_nns_ratio
Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approachget_seq_cos_sim
Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.ncs
Given a set of embeddings and a set of candidate neighbors, find the top N nearest neighbors.nns
Computes the ratio of cosine similarities for two embeddings over the union of their respective top N nearest neighbors.nns_ratio
Permute similarity and ratio computationspermute_contrast
Permute OLSpermute_ols
Plot output of 'get_nns_ratio()'plot_nns_ratio
Find most "prototypical" contexts.prototypical_context
Run jackknife debiased OLSrun_jack_ols
Run OLSrun_ols
Get the tokens of contexts sorrounding user defined patternstokens_context