Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FederatedQueryPlanner #2216

Merged
merged 75 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
cfefd87
Enrich 'i' and 'ri' rows in metadata table with event date
lbschanno Sep 7, 2023
0e4f806
Merge branch 'integration' into task/datedIndexMetadata
ivakegg Sep 29, 2023
f2aa20b
Merge branch 'integration' into task/datedIndexMetadata
ivakegg Oct 4, 2023
ab9ee4c
Merge branch 'integration' into task/datedIndexMetadata
ivakegg Oct 17, 2023
43aeae8
Merge branch 'integration' into task/datedIndexMetadata
lbschanno Nov 7, 2023
83014c1
Merge branch 'integration' into task/datedIndexMetadata
ivakegg Nov 24, 2023
ab14fe2
Add counts to 'i' and 'ri' rows
lbschanno Dec 19, 2023
9aad9f6
Merge branch 'integration' into task/datedIndexMetadata
lbschanno Jan 10, 2024
da6ee69
Initial federated query planner implementation
lbschanno Oct 25, 2023
b40201b
code formatting
lbschanno Jan 12, 2024
8ca62b3
Fixed issues with FederatedQueryIterable
lbschanno Jan 13, 2024
59d3be9
Fix test failures
lbschanno Jan 13, 2024
bde374d
Fix failing tests
lbschanno Jan 13, 2024
461a526
Additional test fixes
lbschanno Jan 13, 2024
7784b4c
pr feedback
lbschanno Jan 18, 2024
b288196
Use new MetadataHelper function version
lbschanno Jan 23, 2024
c34a543
Extract fields to filter index holes
lbschanno Jan 24, 2024
e0ef160
Correct logic for determining sub date ranges
lbschanno Jan 24, 2024
9ea3a84
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Jan 24, 2024
7052bcb
Remove unnecessary check
lbschanno Jan 24, 2024
d44e0f0
code formatting
lbschanno Jan 24, 2024
fca2fff
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Jan 27, 2024
e341c72
Add check for null query model
lbschanno Jan 27, 2024
8779a99
Limit config arg to function scope
lbschanno Jan 29, 2024
a875a20
Update metadata-utils submodule commit
lbschanno Jan 29, 2024
cd20ca5
code formatting
lbschanno Jan 29, 2024
56e3bad
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Feb 6, 2024
39ecec4
Fix failing tests
lbschanno Feb 6, 2024
906f3ee
Additional test fixes
lbschanno Feb 6, 2024
9478a62
Ensure all original tests pass
lbschanno Feb 6, 2024
1bf9201
Add federated planner tests and chained schedulers
lbschanno Feb 27, 2024
941607f
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Feb 27, 2024
c019c78
pr feedback
lbschanno Feb 29, 2024
9c72dab
metadata-utils 3.0.3 tag
ivakegg Mar 6, 2024
e5af693
Fixed the index hole data ingest to set appropriate time stamps on th…
ivakegg Mar 6, 2024
0842cec
Merge branch 'integration' into task/federatedQueryPlanner
ivakegg Mar 6, 2024
d774f6f
Updated applyModel to use the passed in script
ivakegg Mar 6, 2024
3da641d
Remove unneeded changes
lbschanno Mar 6, 2024
235dad8
Make FederatedQueryPlanner the default
lbschanno Mar 7, 2024
b63b24c
Restore original log4j.properties
lbschanno Mar 8, 2024
0c4ccff
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Mar 8, 2024
5515fdc
code formatting
lbschanno Mar 8, 2024
fede1a0
Fix QueryPlanTest
lbschanno Mar 9, 2024
6ed7a39
Updated to test with teardown
ivakegg Mar 9, 2024
1d506e3
Test debugging edits
lbschanno Mar 11, 2024
0578bc2
Updated formatting
ivakegg Mar 13, 2024
e9a76e6
Concatenate sub-plans
lbschanno Mar 13, 2024
195dabe
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Mar 13, 2024
d6354ce
Make FederatedQueryPlanner implement Cloneable
lbschanno Mar 14, 2024
c2a57f0
code formatting
lbschanno Mar 14, 2024
6e1f131
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Jun 3, 2024
a2da80c
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Jun 11, 2024
49c4bc6
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Jun 13, 2024
53a3ea8
Merge branch 'integration' into task/federatedQueryPlanner
ivakegg Jun 25, 2024
cbe8d87
Merge remote-tracking branch 'origin/integration' into task/federated…
ivakegg Jul 1, 2024
74c9db8
* Updated with metadata-utils 4.0.5 (index markers and avoid non-inde…
ivakegg Jul 2, 2024
6608e5f
Merge branch 'integration' into task/federatedQueryPlanner
ivakegg Jul 2, 2024
390ae48
* Allow subclasses of ShardQueryConfiguration
ivakegg Jul 5, 2024
5b52f8b
Merge remote-tracking branch 'origin/integration' into task/federated…
ivakegg Jul 5, 2024
5516f73
Merge branch 'integration' into task/federatedQueryPlanner
ivakegg Jul 8, 2024
8370ea2
Updated to throw a NoResultsException for am empty query.
ivakegg Jul 8, 2024
a9cb2fa
Merge branch 'task/federatedQueryPlanner' of github.com:NationalSecur…
ivakegg Jul 8, 2024
fece27b
import reorg
ivakegg Jul 8, 2024
9aca68a
Updated to avoid expanding unfielded if disabled, and to assume no in…
ivakegg Jul 8, 2024
08ac120
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Aug 22, 2024
a3ba76d
Add tests for default query planner with ne and not-eq
lbschanno Aug 26, 2024
cc1873d
Revert changes to test data format
lbschanno Aug 26, 2024
591f252
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Aug 26, 2024
c09cee1
Revert changes to log4j.properties
lbschanno Aug 26, 2024
0ed1f67
Ensure query plan updated after any exception type
lbschanno Aug 26, 2024
accc7bc
Merge branch 'integration' into task/federatedQueryPlanner
lbschanno Aug 27, 2024
fc0760b
Revert all changes to test data format
lbschanno Aug 27, 2024
c62e331
Merge branch 'integration' into task/federatedQueryPlanner
ivakegg Aug 29, 2024
2a97ede
Merge branch 'integration' into task/federatedQueryPlanner
ivakegg Sep 4, 2024
12cd61e
Merge branch 'integration' into task/federatedQueryPlanner
hgklohr Sep 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -96,25 +96,36 @@ public GenericQueryConfiguration(BaseQueryLogic<?> configuredLogic) {
this(configuredLogic.getConfig());
}

public GenericQueryConfiguration(GenericQueryConfiguration genericConfig) {
this.setQuery(genericConfig.getQuery());
this.setCheckpointable(genericConfig.isCheckpointable());
this.setBaseIteratorPriority(genericConfig.getBaseIteratorPriority());
this.setBypassAccumulo(genericConfig.getBypassAccumulo());
this.setAccumuloPassword(genericConfig.getAccumuloPassword());
this.setConnPoolName(genericConfig.getConnPoolName());
this.setAuthorizations(genericConfig.getAuthorizations());
this.setBeginDate(genericConfig.getBeginDate());
this.setClient(genericConfig.getClient());
this.setEndDate(genericConfig.getEndDate());
this.setMaxWork(genericConfig.getMaxWork());
this.setQueries(genericConfig.getQueries());
this.setQueriesIter(genericConfig.getQueriesIter());
this.setQueryString(genericConfig.getQueryString());
this.setTableName(genericConfig.getTableName());
this.setReduceResults(genericConfig.isReduceResults());
this.setTableConsistencyLevels(genericConfig.getTableConsistencyLevels());
this.setTableHints(genericConfig.getTableHints());
@SuppressWarnings("CopyConstructorMissesField")
public GenericQueryConfiguration(GenericQueryConfiguration other) {
copyFrom(other);
}

/**
* Deeply copies over all fields from the given {@link GenericQueryConfiguration} to this {@link GenericQueryConfiguration}.
*
* @param other
* the {@link GenericQueryConfiguration} to copy values from
*/
public void copyFrom(GenericQueryConfiguration other) {
this.setQuery(other.getQuery());
this.setCheckpointable(other.isCheckpointable());
this.setBaseIteratorPriority(other.getBaseIteratorPriority());
this.setBypassAccumulo(other.getBypassAccumulo());
this.setAccumuloPassword(other.getAccumuloPassword());
this.setConnPoolName(other.getConnPoolName());
this.setAuthorizations(other.getAuthorizations());
this.setBeginDate(other.getBeginDate());
this.setClient(other.getClient());
this.setEndDate(other.getEndDate());
this.setMaxWork(other.getMaxWork());
this.setQueries(other.getQueries());
this.setQueriesIter(other.getQueriesIter());
this.setQueryString(other.getQueryString());
this.setTableName(other.getTableName());
this.setReduceResults(other.isReduceResults());
this.setTableConsistencyLevels(other.getTableConsistencyLevels());
this.setTableHints(other.getTableHints());
}

public Collection<QueryData> getQueries() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,12 @@ public class ShardQueryConfiguration extends GenericQueryConfiguration implement
*/
private boolean sortQueryByCounts = false;

/**
* The minimum percentage threshold that the count for an index row must meet compared to the count for the corresponding frequency row in the metadata
* table in order to NOT be considered a field index hole. The value must be between 0.0-1.0, where 1.0 is equivalent to 100%.
*/
private double fieldIndexHoleMinThreshold = 1.0d;

/**
* Default constructor
*/
Expand All @@ -506,10 +512,20 @@ public ShardQueryConfiguration() {
* @param other
* - another ShardQueryConfiguration instance
*/
@SuppressWarnings("CopyConstructorMissesField")
public ShardQueryConfiguration(ShardQueryConfiguration other) {
copyFrom(other);
}

/**
* Deeply copies over all fields from the given {@link ShardQueryConfiguration} to this {@link ShardQueryConfiguration}.
*
* @param other
* the {@link ShardQueryConfiguration} to copy values from
*/
public void copyFrom(ShardQueryConfiguration other) {
// GenericQueryConfiguration copy first
super(other);
super.copyFrom(other);

// ShardQueryConfiguration copy
this.setCheckpointable(other.isCheckpointable());
Expand Down Expand Up @@ -716,6 +732,7 @@ public ShardQueryConfiguration(ShardQueryConfiguration other) {
this.setUseTermCounts(other.getUseTermCounts());
this.setSortQueryBeforeGlobalIndex(other.isSortQueryBeforeGlobalIndex());
this.setSortQueryByCounts(other.isSortQueryByCounts());
this.setFieldIndexHoleMinThreshold(other.getFieldIndexHoleMinThreshold());
}

/**
Expand Down Expand Up @@ -2055,6 +2072,14 @@ public QueryStopwatch getTimers() {
return timers;
}

public void setTimers(QueryStopwatch timers) {
this.timers = timers;
}

public void appendTimers(QueryStopwatch timers) {
this.timers.appendTimers(timers);
}

public ASTJexlScript getQueryTree() {
return queryTree;
}
Expand Down Expand Up @@ -2688,6 +2713,14 @@ public void setRebuildDatatypeFilterPerShard(boolean rebuildDatatypeFilterPerSha
this.rebuildDatatypeFilterPerShard = rebuildDatatypeFilterPerShard;
}

public double getFieldIndexHoleMinThreshold() {
return fieldIndexHoleMinThreshold;
}

public void setFieldIndexHoleMinThreshold(double fieldIndexHoleMinThreshold) {
this.fieldIndexHoleMinThreshold = fieldIndexHoleMinThreshold;
}

public boolean getReduceIngestTypes() {
return reduceIngestTypes;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,7 @@ protected DefaultQueryPlanner(DefaultQueryPlanner other) {
setSourceLimit(other.sourceLimit);
setPushdownThreshold(other.getPushdownThreshold());
setVisitorManager(other.getVisitorManager());
setTransformRules(other.getTransformRules() == null ? null : new ArrayList<>(other.transformRules));
}

public void setMetadataHelper(final MetadataHelper metadataHelper) {
Expand Down Expand Up @@ -1923,17 +1924,23 @@ protected ASTJexlScript parseQueryAndValidatePattern(String query, TraceStopwatc
if (log.isTraceEnabled()) {
log.trace("Stack trace for overflow " + soe);
}
stopwatch.stop();
if (stopwatch != null) {
stopwatch.stop();
}
PreConditionFailedQueryException qe = new PreConditionFailedQueryException(DatawaveErrorCode.QUERY_DEPTH_OR_TERM_THRESHOLD_EXCEEDED, soe);
log.warn(qe);
throw new DatawaveFatalQueryException(qe);
} catch (ParseException e) {
stopwatch.stop();
if (stopwatch != null) {
stopwatch.stop();
}
BadRequestQueryException qe = new BadRequestQueryException(DatawaveErrorCode.UNPARSEABLE_JEXL_QUERY, e, MessageFormat.format("Query: {0}", query));
log.warn(qe);
throw new DatawaveFatalQueryException(qe);
} catch (PatternSyntaxException e) {
stopwatch.stop();
if (stopwatch != null) {
stopwatch.stop();
}
BadRequestQueryException qe = new BadRequestQueryException(DatawaveErrorCode.INVALID_REGEX, e, MessageFormat.format("Query: {0}", query));
log.warn(qe);
throw new DatawaveFatalQueryException(qe);
Expand Down Expand Up @@ -2977,6 +2984,10 @@ public String getPlannedScript() {
return plannedScript;
}

public void setPlannedScript(String plannedScript) {
this.plannedScript = plannedScript;
}

protected Multimap<String,Type<?>> configureIndexedAndNormalizedFields(MetadataHelper metadataHelper, ShardQueryConfiguration config,
ASTJexlScript queryTree) throws DatawaveQueryException {
// Fetch the mapping of fields to Types from the DatawaveMetadata table
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
package datawave.query.planner;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import datawave.core.query.configuration.QueryData;
import datawave.query.CloseableIterable;

/**
* Implementation of {@link CloseableIterable} intended to be used by {@link FederatedQueryPlanner}. This iterable
*/
public class FederatedQueryIterable implements CloseableIterable<QueryData> {

private final List<CloseableIterable<QueryData>> iterables = new ArrayList<>();

/**
* Add an iterable to this {@link FederatedQueryIterable}.
*
* @param iterable
* the iterable to add
*/
public void addIterable(CloseableIterable<QueryData> iterable) {
if (iterable != null) {
iterables.add(iterable);
}
}

/**
* Closes and clears each iterable in this {@link FederatedQueryIterable}.
*
* @throws IOException
* if an error occurred when closing an iterable
*/
@Override
public void close() throws IOException {
for (CloseableIterable<QueryData> iterable : iterables) {
iterable.close();
}
iterables.clear();
}

/**
* Returns an iterator that will iterate over the {@link QueryData} returned by each iterable in this {@link FederatedQueryIterable}.
*
* @return the iterator
*/
@Override
public Iterator<QueryData> iterator() {
return new Iter();
}

/**
* Iterator implementation that provides the ability to iterate over each {@link QueryData} of the iterables in {@link #iterables}.
*/
private class Iter implements Iterator<QueryData> {

// Iterator that traverses over the iterables.
private final Iterator<CloseableIterable<QueryData>> iterableIterator = iterables.iterator();

// The current sub iterator.
private Iterator<QueryData> currentSubIterator = null;

@Override
public boolean hasNext() {
seekToNextAvailableQueryData();
return currentSubIterator != null && currentSubIterator.hasNext();
}

@Override
public QueryData next() {
return currentSubIterator.next();
}

/**
* Seek to the next sub-iterator that has a {@link QueryData} remaining in it.
*/
private void seekToNextAvailableQueryData() {
// If the current sub iterator is null, attempt to get the next available iterator, or return early if there are no more iterators.
if (currentSubIterator == null) {
if (iterableIterator.hasNext()) {
currentSubIterator = iterableIterator.next().iterator();
} else {
return;
}
}
// If the current sub iterator does not have any more elements remaining, move to the next sub iterator that does have elements.
if (!currentSubIterator.hasNext()) {
while (iterableIterator.hasNext()) {
// We must ensure we only ever call iterator() once on each sub-iterator.
currentSubIterator = iterableIterator.next().iterator();
if (currentSubIterator.hasNext()) {
return;
}
}
}
}
}
}
Loading
Loading