Code docs¶
-
class
ananse.
Ananse
[source]¶ A python package to partially automate search term selection and write search strategies for systematic reviews
-
create_dtm
(doc, min_len=2, max_len=3, **kwargs)[source]¶ This method creates a Document-Term Matrix
- Parameters
min_len – minimum keyword length
max_len – maximum keyword length
doc – a list of article title, abstract or any article property
keywords – a list of keywords to use for the Document-Term Matrix
dfm_type – whether the dtm should be created based on document tokens or a restricted list of keywords options: token or keywords
- Returns
a multidimensional array of a Document-Term Matrix and a list of terms(columns)
-
create_network
(im, keywords, draw_graph=False, save_network=False, save_directory=None)[source]¶ This method creates a graph when given a Document-Term Matrix in the form of an incidence matrix
- Parameters
im – the incidence matrix
keywords – keywords for labelling
draw_graph – if TRUE, graph is drawn
save_network – if TRUE, saves the graph to a .png
save_directory – the path to a directory where search results will be saved if save_dataset is set to TRUE
- Returns
a networkx graph
-
deduplicate_dataframe
(DataFrame, columns)[source]¶ this method duplicated a DataFrame based on certain columns it considers on the first occurrence of a row as unique and deletes(inplace=True) other duplicates
- Parameters
DataFrame – a pandas DataFrame to be deduplicated
columns – a list of fields to check for duplicate values and deduplicated the dataframe
- Returns
DataFrame with removed duplicate rows depending on Arguments passed.
-
dtm_to_dataframe
(dtm, keywords, doc)[source]¶ This method created a data frame of a Document-Term Matrix
- Parameters
dtm – a multidimensional array of a Document-Term Matrix
keywords – a list of keywords to use for the Document-Term Matrix
doc – a list of article title, abstract or any article property
- Returns
a data frame of a Document-Term Matrix
-
extract_terms
(DataFrame, min_len=2, max_len=4)[source]¶ This method uses the RAKE Algorithm to extract keywords from the text column of the DataFrame of naive search results.
- Parameters
DataFrame –
min_len – minimum keyword length
max_len – maximum keyword length
- Returns
a list consisting of a combination of extracted keywords and author keyword
-
find_cutoff
(g, method, importance_method, degrees=2, knot_num=1, percent=0.2, diagnostics=False)[source]¶ This method finds the cutoff for a graph network using either cumulative or spline method of cutting of the degree distribution
- Parameters
g – graph
method – method of finding cutoff
importance_method – method to use to check node importance
degrees – spline degree
knot_num – spline number of knots
percent – cutoff percentage for cumulative method
- Returns
cutoff strengths
-
find_knots
(x, y, degrees, knot_num=1)[source]¶ This method find the knots of a two sets of values
- Parameters
x – x values
y – y values
degrees – degrees of the spline
knot_num – number of knots of the spline
- Returns
t = knots, c = spline coefficients, k = B-spline order
-
fit_spline
(t, c, k)[source]¶ This methods fits t = knots, c = spline coefficients, k = B-spline order to a B-spline
- Parameters
t – knots
c – spline coefficients
k – B-spline order
- Returns
fitted B-spline
-
get_centrality
(g, method)[source]¶ This method evaluate the node importance of a graph
- Parameters
g – a graph from which you find its node importance
method – the method for finding the node importance degree, closeness, betweenness or eigenvalue
- Returns
a dictionary containing nodes with their importance
-
get_keywords
(g, importance_method, cutoff_strength, save_keywords=True, save_directory=None, draw_reduced_graph=False)[source]¶ - Parameters
g – graph
importance_method – method to use to check node importance
cutoff_strength – where to cut off of the graph
save_keywords – if save_keywords=True saves the keywords to a .csv
save_directory – path to a directory where suggested keywords will be saved if save_dataset is set to TRUE
draw_reduced_graph – RUE, draws reduced graph
- Returns
suggested keywords for final review
-
import_naive_results
(path, save_dataset=False, save_directory=None, clean_dataset=False)[source]¶ This method imports the search results from a specified path
- Parameters
clean_dataset – if TRUE, de-duplicates search results after importing
save_dataset – if TRUE, saves the full search results to a .csv
save_directory – the path to a directory where search results will be saved if save_dataset is set to TRUE
path – path containing the naive search results files
- Returns
a pandas data frame consisting of assembled search results
-
make_importance
(g, importance_method)[source]¶ This methods creates a dataframe made up of node names with their importance and their rank (index) from a graph
- Parameters
g – graph
importance_method – method to use to check node importance
- Returns
a data frame of rank, node importance and node name
-
plot_degree_distribution
(g, save_plot=False, save_directory=None)[source]¶ This method plots a distribution of the graph degree
- Parameters
g – graph whose degree distribution is to be plotted
save_plot – if save_plot=True saves the plot to a .png
save_directory – the path to a directory where search results will be saved if save_plot is set to TRUE
- Returns
-
plot_degree_histogram
(g, save_plot=False, save_directory=None)[source]¶ This method plots a histogram of the graph degree
- Parameters
g – graph whose degree distribution is to be plotted
save_plot – if save_plot=True saves the plot to a .png
save_directory – the path to a directory where search results will be saved if save_plot is set to TRUE
- Returns
-