CLI reference¶
This page is generated during the Sphinx build from each command module docstring and dawgtools <subcommand> -h output.
extract_batch¶
Extract features from one or more input files.
Environment¶
OPENAI_API_KEYmust be set.OPENAI_BASE_URLcan be used to set a custom API base URL.
Set environment variables:
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1" # optional
Example¶
Given a schema file (e.g., developed using toolbuilder) and a directory of text
files named input_texts, extract features into features.csv:
dawgtools extract_batch schema.json -d input_texts -o features.csv
Caching¶
A cache directory is created to store intermediate results and avoid re-querying the model for files that have already been processed. New model queries are performed each time the schema file changes.
Schema format¶
The schema file should be a JSON file defining a tool compatible with the OpenAI function calling API. See:
https://platform.openai.com/docs/guides/function-calling
Example schema (from the OpenAI documentation):
{
"type": "function",
"name": "extract_features",
"description": "Extract features from text",
"parameters": {
"type": "object",
"properties": {
"feature1": {
"type": "string",
"description": "Description of feature1"
},
"feature2": {
"type": "integer",
"description": "Description of feature2"
}
},
"required": ["feature1", "feature2"]
}
}
dawgtools extract_batch -h¶
usage: dawgtools extract_batch [-h] [-i INFILE] [-d DIRNAME] [-p PROMPT]
[-o OUTFILE] [-m MODEL] [--cache-dir CACHE_DIR]
[-n]
schema
Extract features from one or more input files.
Environment
-----------
- ``OPENAI_API_KEY`` must be set.
- ``OPENAI_BASE_URL`` can be used to set a custom API base URL.
Set environment variables::
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1" # optional
Example
-------
Given a schema file (e.g., developed using toolbuilder) and a directory of text
files named ``input_texts``, extract features into ``features.csv``::
dawgtools extract_batch schema.json -d input_texts -o features.csv
Caching
-------
A cache directory is created to store intermediate results and avoid re-querying
the model for files that have already been processed. New model queries are
performed each time the schema file changes.
Schema format
-------------
The schema file should be a JSON file defining a tool compatible with the OpenAI
function calling API. See:
https://platform.openai.com/docs/guides/function-calling
Example schema (from the OpenAI documentation):
.. code-block:: json
{
"type": "function",
"name": "extract_features",
"description": "Extract features from text",
"parameters": {
"type": "object",
"properties": {
"feature1": {
"type": "string",
"description": "Description of feature1"
},
"feature2": {
"type": "integer",
"description": "Description of feature2"
}
},
"required": ["feature1", "feature2"]
}
}
positional arguments:
schema json file with feature schema
options:
-h, --help show this help message and exit
-i, --infile INFILE A single input file
-d, --dirname DIRNAME
A directory of input files
-p, --prompt PROMPT Optional file with additional prompt content
-o, --outfile OUTFILE
Output file
-m, --model MODEL Model name [gpt-5.2]
--cache-dir CACHE_DIR
Directory containing cached results
[extract_batch_cache]
-n, --no-cache
query¶
Execute an sql query.
Renders a query template string into a parameterized sql query.
Use a combination of python string formatting directives (for variable substituion) and jinja2 expressions (for conditional expressions).
For example:
$ dawgtools -v query -q “select ‘foo’ as col1, %(barval)s as col2” -p barval=bar {“col1”: “foo”, “col2”: “bar”}
The command may be preceded by the creation and loading of a temporary table containing mrns that can be referenced in the query. For example:
$ cat mrns.txt fee fie fo fum $ dawgtools query –mrns mrns.txt -q ‘select * from #mrns’ {“mrn”: “fee”} {“mrn”: “fie”} {“mrn”: “fo”} {“mrn”: “fum”}
dawgtools query -h¶
usage: dawgtools query [-h] [-q QUERY] [-i INFILE] [-n {path_reports,notes}]
[-p PARAMS] [-P PARAMS_FILE] [--mrns FILE]
[--temp-schema FILE] [--temp-data FILE] [-o OUTFILE]
[-f {jsonl,json,json-rows,csv}] [-x]
Execute an sql query.
Renders a query template string into a parameterized sql query.
Use a combination of python string formatting directives (for variable
substituion) and jinja2 expressions (for conditional expressions).
For example:
$ dawgtools -v query -q "select 'foo' as col1, %(barval)s as col2" -p barval=bar
{"col1": "foo", "col2": "bar"}
The command may be preceded by the creation and loading of a temporary table
containing mrns that can be referenced in the query. For example:
$ cat mrns.txt
fee
fie
fo
fum
$ dawgtools query --mrns mrns.txt -q 'select * from #mrns'
{"mrn": "fee"}
{"mrn": "fie"}
{"mrn": "fo"}
{"mrn": "fum"}
options:
-h, --help show this help message and exit
-x, --dry-run Print the rendered query and exit
inputs:
-q, --query QUERY sql command
-i, --infile INFILE Input file containing an sql command
-n, --query-name {path_reports,notes}
name of an sql query
-p, --params PARAMS One or more variable value pairs in the form -p
var=val; these are used as parameters when rendering
the query.
-P, --params-file PARAMS_FILE
json file containing parameter values
temptable:
--mrns FILE A file containing whitespace-delimited mrns to be
loaded into a temporary table '#mrns(mrn
varchar(102))' before the query.
--temp-schema FILE File containing schema for a temporary table to be
created before running the query.
--temp-data FILE CSV file with columns corresponding to the schema
containing data to load into the temporary table
before running the query. Requires --temp-schema.
Columns not in the schema are ignored.
outputs:
-o, --outfile OUTFILE
Output file name; uses gzip compression if ends with
.gz or stdout if not provided.
-f, --format {jsonl,json,json-rows,csv}
Output format [jsonl]