Skip to content

Conversation

@DefinetlyNotAI
Copy link
Owner

@DefinetlyNotAI DefinetlyNotAI commented Aug 31, 2025

Prerequisites

  • I have searched for duplicate or closed issues.
  • I have read the contributing guidelines.
  • I have followed the instructions in the wiki about
    contributions.
  • I have updated the documentation accordingly, if required.
  • I have tested my code with the --dev flag, if required.

PR Type

  • Bug fix
  • Deprecation Change
  • New feature
  • Refactoring
  • Documentation
    update
  • ⚠️ Breaking change ⚠️

Description

This pull request introduces significant updates to the Logicytics project, focusing on expanding hardware audit capabilities, updating configuration and model management, and making improvements to logging and code organization. The most important changes are grouped below:

New Hardware Audit Modules:

  • Added encrypted_drive_audit.py, a new script that analyzes Windows encrypted volumes, gathers BitLocker status, and generates a detailed report.
  • Added usb_history.py, a new module that extracts and logs historical USB device connection information from the Windows registry.

Configuration and Model Updates:

  • Updated CODE/config.ini to version 3.6.0, added new modules (encrypted_drive_audit.py, usb_history.py), changed model paths, and simplified VulnScan settings for improved clarity and maintainability. [1] [2]

IDE and Project Metadata:

  • Added .idea/csv-editor.xml to project files, specifying CSV editor settings for the report.csv file.

Codebase and Logging Improvements:

  • Exposed the config object in the main Logicytics package for easier access to configuration throughout the codebase.
  • Minor logging annotation added in Logicytics.py to improve code inspection and maintainability.

Motivation and Context

This release strengthens the forensic capabilities of Logicytics by adding USB history logging and BitLocker volume analysis, while also refining sensitive path scanning and logging. These changes improve reliability, maintainability, and readiness for future enhancements.

Credit

N/A

Issues Fixed

N/A

Summary by CodeRabbit

  • New Features

    • Added Windows utilities to audit encrypted volumes and collect USB device history; both produce timestamped, human-readable reports.
    • Introduced a streamlined sensitive-data scanner with threaded processing, per-file backups, and JSON/CSV reports.
  • Refactor

    • Simplified scanning pipeline and model integration for faster, more maintainable scans.
    • Re-exported config for easier access.
  • Chores

    • Bumped config to v3.6.0; added new model reference, threshold, and text-length setting.
  • Documentation

    • Marked 3.6.x as supported in security policy; removed the Data Extraction section from the README.

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
…atus

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
…ity; update PLANS.md for version tracking

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
… config.ini for new settings

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
…ain execution logic

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
…n.py

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
@DefinetlyNotAI DefinetlyNotAI self-assigned this Aug 31, 2025
Copilot AI review requested due to automatic review settings August 31, 2025 16:27
@DefinetlyNotAI DefinetlyNotAI added type/Documentation Improvements or additions to commentations request/Important New feature or request, top priority, for next update labels Aug 31, 2025
@DefinetlyNotAI DefinetlyNotAI added the type/Code Related to the Code part label Aug 31, 2025
@pull-request-size pull-request-size bot added the size/XL Huge size pr label Aug 31, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 31, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds IDE CSV config, bumps system config to 3.6.0, re-exports config, adds Windows collectors (encrypted_drive_audit, usb_history), fully rewrites vulnscan.py to a threaded scanner with a small ML classifier and reporting/backups, inserts a noinspection, and updates docs (README, PLANS, SECURITY).

Changes

Cohort / File(s) Summary
IDE config
\.idea/csv-editor.xml
New CsvFileAttributes mapping for \CODE\report.csv with separator ,; project version 4.
Core entry & package export
CODE/Logicytics.py, CODE/logicytics/__init__.py
Inserted # noinspection PyUnreachableCode before a ZIP call (no behavior change). Added config to __all__ to re-export it.
Configuration update
CODE/config.ini
Version bumped 3.5.1 → 3.6.0; updated files list (added encrypted_drive_audit.py, usb_history.py, new vulnscan model); consolidated VulnScan sub-sections into [VulnScan Settings] with threshold and model; removed unreadable_extensions and max_file_size_mb, added/renamed text_char_limit.
VulnScan overhaul
CODE/vulnscan.py
Full rewrite: replaces async scanner with thread-pooled pipeline, adds SimpleNN classifier + SentenceTransformer embeddings, new constants (paths, thresholds, device), process_file, scan_directory, main, per-file backups, and JSON/CSV reports.
New Windows tools
CODE/encrypted_drive_audit.py, CODE/usb_history.py
New scripts: encrypted volume audit (wmic, mountvol, manage-bde, PowerShell checks) and USB history collector (enumerates HKLM USBSTOR, reads last-write times and friendly names); both produce text reports and have __main__ guards.
Docs & release notes
README.md, PLANS.md, SECURITY.md
README: removed Data Extraction section. PLANS: updated tasks (v3.6.1, v4.0.0 note). SECURITY: added 3.6.x as supported.
Minor comment
CODE/logicytics/Flag.py
Updated TODO version string (v3.6.0 → v3.6.1) only; no code changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as vulnscan.py
  participant FS as File System
  participant Embed as SentenceTransformer
  participant Model as SimpleNN

  User->>App: Run __main__
  App->>FS: Walk SENSITIVE_PATHS
  par ThreadPool (NUM_WORKERS)
    App->>FS: Read file
    App->>App: Trim & split lines
    App->>Embed: Encode lines
    Embed-->>App: Embeddings
    App->>Model: Predict
    Model-->>App: Probabilities
    App->>FS: Backup sensitive files (if hit)
  end
  App->>FS: Write REPORT_JSON & REPORT_CSV
  App-->>User: Summary
Loading
sequenceDiagram
  autonumber
  actor User
  participant Audit as encrypted_drive_audit.py
  participant OS as Windows
  participant BL as manage-bde/PowerShell
  participant Report as win_encrypted_volume_report.txt

  User->>Audit: Run script
  Audit->>OS: Collect host/user/time info
  Audit->>OS: Run wmic / mountvol
  alt manage-bde available
    Audit->>BL: manage-bde -status (A:..Z:)
    BL-->>Audit: Drive statuses
  else
    Audit->>Audit: Log warning
  end
  alt PowerShell available
    Audit->>BL: Get-BitLockerVolume
    BL-->>Audit: Volume details
  else
    Audit->>Audit: Log warning
  end
  Audit->>Report: Append outputs
  Audit-->>User: Report path
Loading
sequenceDiagram
  autonumber
  actor User
  participant USB as usb_history.py
  participant Reg as Windows Registry
  participant File as usb_history.txt

  User->>USB: Run script
  USB->>Reg: Enumerate HKLM\\SYSTEM\\...\\USBSTOR
  loop Subkeys
    USB->>Reg: Get last-write time + FriendlyName
    Reg-->>USB: Data or fallback
    USB->>File: Append history entry
  end
  USB-->>User: Done (file path)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

type/Development, type/System

Poem

New tools land with a tiny ping,
Drives and USBs spill what they bring.
Vulnscan slimmed down, threads do the grind,
Models hum, copying hits they find.
Config bumped, reports saved — neat and aligned. ✨


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bdc9196 and bf9cf3f.

📒 Files selected for processing (1)
  • CODE/logicytics/Flag.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CODE/logicytics/Flag.py
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch v3.6.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates Logicytics to version 3.6.0, introducing new hardware audit capabilities and significant improvements to the vulnerability scanning system. The update enhances the project's forensic analysis tools while improving code organization and maintainability.

  • Complete rewrite of the VulnScan module with improved neural network architecture and sentence transformers
  • Addition of two new audit modules for encrypted drive analysis and USB device history tracking
  • Configuration updates to support new modules and simplified settings structure

Reviewed Changes

Copilot reviewed 9 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
SECURITY.md Added support for version 3.6.x
README.md Removed detailed data extraction table to streamline documentation
PLANS.md Updated roadmap items marking encrypted volume and USB history tasks as complete
CODE/vulnscan.py Complete rewrite with sentence transformers, neural network model, and improved file processing
CODE/usb_history.py New module for extracting USB device connection history from Windows registry
CODE/logicytics/init.py Exposed config object for package-wide access
CODE/encrypted_drive_audit.py New module for analyzing Windows encrypted volumes and BitLocker status
CODE/config.ini Updated to v3.6.0 with new modules and simplified VulnScan settings
CODE/Logicytics.py Added minor logging annotation
Files not reviewed (1)
  • .idea/csv-editor.xml: Language not supported

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@qltysh
Copy link
Contributor

qltysh bot commented Aug 31, 2025

All good ✅

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
SECURITY.md (1)

55-55: Fix the garbled "2�5" text.

Typo/encoding issue in the SLA. Replace with ASCII-safe "2-5".

Apply this diff:

- - **Acknowledgment**: Upon receiving your report, we will acknowledge receipt within 2�5 business days.
+ - **Acknowledgment**: Upon receiving your report, we will acknowledge receipt within 2-5 business days.
CODE/vulnscan.py (1)

188-197: Reports get overwritten for each SENSITIVE_PATH — aggregate once.

Right now report.json/csv gets replaced on every loop. Merge results then write a single report.

Apply:

-if __name__ == "__main__":
-    log.info(f"Starting VulnScan with {NUM_WORKERS} thread workers and {len(SENSITIVE_PATHS)}...")
-    for path in SENSITIVE_PATHS:
-        expanded_path = os.path.expandvars(path)
-        if os.path.exists(expanded_path):
-            ROOT_DIR = expanded_path
-            main()
-        else:
-            log.warning(f"Path does not exist and will be skipped: {expanded_path}")
+if __name__ == "__main__":
+    log.info(f"Starting VulnScan with {NUM_WORKERS} thread workers and {len(SENSITIVE_PATHS)}...")
+    all_sensitive = []
+    for path in SENSITIVE_PATHS:
+        expanded_path = os.path.expandvars(path)
+        if os.path.exists(expanded_path):
+            all_sensitive.extend(main(expanded_path))
+        else:
+            log.warning(f"Path does not exist and will be skipped: {expanded_path}")
+
+    # Save JSON report (merged)
+    with open(REPORT_JSON, "w", encoding="utf-8") as f:
+        json.dump(all_sensitive, f, indent=2, ensure_ascii=False)
+
+    # Save CSV report (merged)
+    with open(REPORT_CSV, "w", newline="", encoding="utf-8") as f:
+        writer = csv.DictWriter(f, fieldnames=["file", "probability", "copied_to", "reason"])
+        writer.writeheader()
+        for entry in all_sensitive:
+            row = entry.copy()
+            row["reason"] = " | ".join(entry["reason"])
+            writer.writerow(row)
+
+    print()
+    log.debug("Sensitive files detected and backed up:")
+    for entry in all_sensitive:
+        log.debug(f" - {entry['file']} (prob={entry['probability']:.4f})")
+        for line in entry["reason"]:
+            log.debug(f"     -> {line}")
+    print()
+    log.info("Backup completed.\n")
+    log.debug(f"Files copied into: {SAVE_DIR}")
+    log.debug(f"JSON report saved as: {REPORT_JSON}")
+    log.debug(f"CSV report saved as: {REPORT_CSV}")

Note: main() update below.

🧹 Nitpick comments (20)
SECURITY.md (1)

9-11: Double-check the 3.6.x release date (looks duplicated).

Both 3.6.x and 3.5.x show "July 26, 2025". If 3.6.0 is shipping with this PR on August 31, 2025, the date here should reflect that. Please confirm and update.

.idea/csv-editor.xml (1)

6-13: Make the CSV path portable.

The key starts with a root backslash, which is Windows-drive-root relative and brittle. Prefer $PROJECT_DIR$ so teammates on any OS don’t break.

Apply this diff:

-        <entry key="\CODE\report.csv">
+        <entry key="$PROJECT_DIR$/CODE/report.csv">

Also confirm you actually want this .idea file in VCS (team-wide IDE setting).

PLANS.md (2)

11-11: Tiny clarity tweak: call out current vs next model.

To avoid confusion with the code/config using 4n1 today, add a quick note that 4n2 is the next bump in 3.6.1.

Apply this diff:

-| Update to model 4n2 of vulnscan                                                                                                                                            | v3.6.1  | ✅                      |
+| Update to model 4n2 of vulnscan (current release uses 4n1)                                                                                                                 | v3.6.1  | ✅                      |

15-15: Track this with an issue so it doesn’t get lost.

Create a GH issue for the wiki automation so it’s schedulable against v4.0.0.

CODE/usb_history.py (4)

13-22: Confirm logger has debug(); otherwise this will throw.

If log.debug isn’t defined in Logger, switch to log.info or guard it.

Possible minimal tweak:

-            log.debug(f"Saved entry: {message}")
+            if hasattr(log, "debug"):
+                log.debug(f"Saved entry: {message}")

60-69: Try DeviceDesc if FriendlyName is missing.

DeviceDesc is a common fallback for USBSTOR entries.

Apply this diff:

     def _get_friendly_name(dev_info_path, device_id):
         """Return the friendly name of a device if available, else the device ID."""
-        try:
-            with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, dev_info_path) as dev_key:
-                return winreg.QueryValueEx(dev_key, "FriendlyName")[0]
-        except FileNotFoundError:
-            return device_id
-        except Exception as e:
-            log.warning(f"Failed to read friendly name for {dev_info_path}: {e}")
-            return device_id
+        if winreg is None:
+            return device_id
+        try:
+            with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, dev_info_path) as dev_key:
+                try:
+                    return winreg.QueryValueEx(dev_key, "FriendlyName")[0]
+                except FileNotFoundError:
+                    try:
+                        return winreg.QueryValueEx(dev_key, "DeviceDesc")[0]
+                    except FileNotFoundError:
+                        return device_id
+        except OSError as e:
+            log.warning(f"Failed to read friendly name for {dev_info_path}: {e}")
+            return device_id

71-86: Skip cleanly on non-Windows and serialize time as ISO.

Bail early when winreg is None, and write the timestamp in ISO 8601 for consistency.

Apply this diff:

     def read(self):
         """Read all USB devices from USBSTOR and log their info."""
-        log.info("Starting USB history extraction...")
+        if os.name != "nt" or winreg is None:
+            log.warning("USB history is Windows-only; skipping.")
+            return
+        log.info("Starting USB history extraction...")
         reg_path = r"SYSTEM\CurrentControlSet\Enum\USBSTOR"
         try:
             for device_class in self._enum_subkeys(winreg.HKEY_LOCAL_MACHINE, reg_path, log.warning):
                 dev_class_path = f"{reg_path}\\{device_class}"
                 for device_id in self._enum_subkeys(winreg.HKEY_LOCAL_MACHINE, dev_class_path, log.warning):
                     dev_info_path = f"{dev_class_path}\\{device_id}"
                     friendly_name = self._get_friendly_name(dev_info_path, device_id)
-                    last_write = self._get_last_write_time(winreg.HKEY_LOCAL_MACHINE, dev_info_path) or "Unknown"
-                    self._save_history(f"USB Device Found: {friendly_name} | LastWriteTime: {last_write}")
+                    last_write = self._get_last_write_time(winreg.HKEY_LOCAL_MACHINE, dev_info_path)
+                    last_write_str = last_write.isoformat(timespec="seconds") if isinstance(last_write, datetime) else "Unknown"
+                    self._save_history(f"USB Device Found: {friendly_name} | LastWriteTime: {last_write_str}")
             log.info(f"USB history extraction complete, saved to {self.history_path}")
-        except Exception as e:
+        except OSError as e:
             log.error(f"Error during USB history extraction: {e}")

11-11: Output path might be unwritable.

Writing next to the module can fail under Program Files/UAC. Consider a user-writable dir (e.g., your SAVE_DIR/temp) and make it configurable.

CODE/config.ini (1)

30-30: Path separators are mixed.

Backslashes in the files list are Windows-centric. If this config is used cross-platform, consider normalizing to forward slashes or normalizing at load time.

CODE/encrypted_drive_audit.py (6)

16-31: Harden run_cmd: type hints + configurable timeout (keeps callers unchanged).

Makes intent safer and quieter for linters; still shell=False.

Apply:

-def run_cmd(cmd):
+def run_cmd(cmd: list[str], timeout: int = 30) -> tuple[str, str, int]:
     log.debug(f"Running command: {cmd}")
     try:
-        proc = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
+        proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)

Also add import:

 import subprocess
 from pathlib import Path
+from typing import Sequence

55-58: Windows-only guard + write report to a user-writable dir.

Prevents noisy failures on non-Windows and avoids permission issues under Program Files.

 def main():
-    script_dir = Path(__file__).resolve().parent
-    report_path = script_dir / "win_encrypted_volume_report.txt"
+    if os.name != "nt":
+        log.error("This audit is Windows-only.")
+        return
+    script_dir = Path(__file__).resolve().parent
+    report_dir = Path(os.environ.get("LOCALAPPDATA", str(script_dir)))
+    report_path = report_dir / "win_encrypted_volume_report.txt"

70-76: wmic is deprecated on modern Windows—fallback to PowerShell CIM if missing.

Gives reliable output on Win10/11 boxes where wmic isn’t present.

-        log.info("Gathering logical volumes via wmic")
-        f.write("Logical Volumes (wmic):\n")
-        out, err, _ = run_cmd(["wmic", "logicaldisk", "get",
-                               "DeviceID,DriveType,FileSystem,FreeSpace,Size,VolumeName"])
-        f.write(out + "\n" + err + "\n\n")
+        if have("wmic"):
+            log.info("Gathering logical volumes via wmic")
+            f.write("Logical Volumes (wmic):\n")
+            out, err, _ = run_cmd(["wmic", "logicaldisk", "get",
+                                   "DeviceID,DriveType,FileSystem,FreeSpace,Size,VolumeName"])
+            f.write(out + "\n" + err + "\n\n")
+        else:
+            log.warning("wmic not found; using PowerShell CIM fallback")
+            f.write("Logical Volumes (PowerShell CIM):\n")
+            ps = ("Get-CimInstance Win32_LogicalDisk | "
+                  "Select-Object DeviceID,DriveType,FileSystem,FreeSpace,Size,VolumeName | "
+                  "Format-Table -AutoSize")
+            out, err, _ = run_cmd(["powershell", "-NoProfile", "-NonInteractive", "-Command", ps])
+            f.write(out + "\n" + err + "\n\n")

1-8: Tiny nit: import string for drive loop readability.

 import platform
+import string
 import shutil

85-90: Use ascii_uppercase for clarity.

-            for letter in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
+            for letter in string.ascii_uppercase:
                 path = f"{letter}:"

93-101: Non-interactive PowerShell to avoid prompts/UI stalls.

-            out, err, _ = run_cmd(["powershell", "-NoProfile", "-Command", ps_cmd])
+            out, err, _ = run_cmd(["powershell", "-NoProfile", "-NonInteractive", "-NoLogo", "-Command", ps_cmd])
CODE/vulnscan.py (4)

139-146: Pass root into process_file + avoid scanning the backup folder if it sits under root.

 def scan_directory(root):
     sensitive_files = []
     with ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:
         futures = []
-        for dirpath, _, filenames in os.walk(root):
+        save_dir_abs = os.path.abspath(SAVE_DIR)
+        for dirpath, _, filenames in os.walk(root):
+            # skip backup folder if it's inside the scanned root
+            if os.path.abspath(dirpath).startswith(save_dir_abs):
+                continue
             for file in filenames:
-                futures.append(executor.submit(process_file, os.path.join(dirpath, file)))
+                futures.append(executor.submit(process_file, os.path.join(dirpath, file), root))

133-135: Don’t catch bare Exception; narrow it.

Keeps real errors visible while still being resilient.

-    except Exception as e:
-        log.error(f"Could not process {filepath}: {e}")
+    except (OSError, UnicodeDecodeError, RuntimeError) as e:
+        log.error(f"Could not process {filepath}: {e}")

16-21: Dockerfile has no extension—add a name-based allowlist.

Right now “Dockerfile” won’t scan.

 TEXT_EXTENSIONS = {
@@
 }
+SPECIAL_FILENAMES = {"Dockerfile", "dockerfile"}

And tweak the gate:

-        if ext.lower() not in TEXT_EXTENSIONS:
+        if ext.lower() not in TEXT_EXTENSIONS and os.path.basename(filepath) not in SPECIAL_FILENAMES:
             return None

175-185: Do you really want to log “reason” lines (may contain secrets)?

Could leak sensitive content into logs. Consider a config flag to enable this only in DEBUG or redact values.

CODE/Logicytics.py (1)

444-444: noinspection is fine, but keep scope tight.

If it was silencing a false positive, cool. If not needed anymore, drop it to avoid masking real issues later.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cb27001 and d9d1572.

⛔ Files ignored due to path filters (1)
  • CODE/vulnscan/vectorizer.3n3.pkl is excluded by !**/*.pkl
📒 Files selected for processing (10)
  • .idea/csv-editor.xml (1 hunks)
  • CODE/Logicytics.py (1 hunks)
  • CODE/config.ini (2 hunks)
  • CODE/encrypted_drive_audit.py (1 hunks)
  • CODE/logicytics/__init__.py (1 hunks)
  • CODE/usb_history.py (1 hunks)
  • CODE/vulnscan.py (1 hunks)
  • PLANS.md (1 hunks)
  • README.md (0 hunks)
  • SECURITY.md (1 hunks)
💤 Files with no reviewable changes (1)
  • README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-26T15:27:01.170Z
Learnt from: DefinetlyNotAI
PR: DefinetlyNotAI/Logicytics#229
File: CODE/vulnscan.py:187-188
Timestamp: 2025-07-26T15:27:01.170Z
Learning: In CODE/vulnscan.py, the model and vectorizer file paths "vulnscan/Model SenseMini .3n3.pth" and "vulnscan/Vectorizer .3n3.pkl" intentionally include trailing spaces before the file extensions, and these match the actual file names on disk. The files will be renamed in the future to remove the trailing spaces.

Applied to files:

  • CODE/vulnscan.py
  • CODE/config.ini
🧬 Code graph analysis (5)
CODE/Logicytics.py (1)
CODE/logicytics/FileManagement.py (4)
  • Zip (55-230)
  • and_hash (204-230)
  • FileManagement (12-230)
  • __move_files (190-201)
CODE/logicytics/__init__.py (1)
CODE/logicytics/Config.py (1)
  • __config_data (5-42)
CODE/encrypted_drive_audit.py (2)
CODE/logicytics/Logger.py (3)
  • warning (262-272)
  • error (274-284)
  • info (250-260)
CODE/logicytics/Checks.py (1)
  • admin (13-23)
CODE/usb_history.py (1)
CODE/logicytics/Logger.py (3)
  • error (274-284)
  • warning (262-272)
  • info (250-260)
CODE/vulnscan.py (1)
CODE/logicytics/Logger.py (4)
  • debug (185-192)
  • error (274-284)
  • info (250-260)
  • warning (262-272)
🪛 Ruff (0.12.2)
CODE/encrypted_drive_audit.py

19-19: subprocess call: check for execution of untrusted input

(S603)

CODE/usb_history.py

21-21: Do not catch blind exception: Exception

(BLE001)


41-41: Do not catch blind exception: Exception

(BLE001)


56-56: Do not catch blind exception: Exception

(BLE001)


67-67: Do not catch blind exception: Exception

(BLE001)


84-84: Do not catch blind exception: Exception

(BLE001)

CODE/vulnscan.py

126-131: Consider moving this statement to an else block

(TRY300)


133-133: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (2)
CODE/config.ini (1)

29-30: Fix file-existence check
The script’s Path(f) lookups run from the repo root, but config.ini lists files relative to CODE/. Update the check to either cd CODE before running or prefix each entry with CODE/ (e.g. Path("CODE")/f.replace("\\","/")), then rerun to confirm all listed scripts and assets actually exist.

CODE/logicytics/__init__.py (1)

125-126: Exporting config via all looks good.

Makes “from logicytics import config” work across the repo.

@github-project-automation github-project-automation bot moved this from Todo to Waiting on Review in Issue Board Aug 31, 2025
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
CODE/vulnscan.py (4)

83-88: process_file needs the root directory passed in

The function currently depends on a global ROOT_DIR that might not be defined when this runs, causing a NameError.

-def process_file(filepath):
+def process_file(filepath, root_dir):

And you'll need to update line 121 too.


121-121: ROOT_DIR isn't defined here - this will crash!

This line uses ROOT_DIR but it's only set in the main section, so calling process_file from anywhere else will cause a NameError.

-        rel_path = os.path.relpath(filepath, ROOT_DIR)
+        rel_path = os.path.relpath(filepath, root_dir)

156-158: main() should be more flexible by accepting root_dir parameter

Making main() accept root_dir as a parameter instead of using the global would make it more reusable.

-def main():
-    log.info(f"Scanning directory: {ROOT_DIR} - This will take some time...")
-    sensitive = scan_directory(ROOT_DIR)
+def main(root_dir):
+    log.info(f"Scanning directory: {root_dir} - This will take some time...")
+    sensitive = scan_directory(root_dir)

And update the main section to pass ROOT_DIR to main().


189-189: Incomplete log message is confusing

The log message says "paths..." but doesn't finish the thought about what the paths are for.

-    log.info(f"Starting VulnScan with {NUM_WORKERS} thread workers and {len(SENSITIVE_PATHS)} paths...")
+    log.info(f"Starting VulnScan with {NUM_WORKERS} thread workers and {len(SENSITIVE_PATHS)} paths to scan...")
🧹 Nitpick comments (3)
CODE/vulnscan.py (3)

104-104: This embed_model.encode call could be slow for large files

Just a heads up - if someone has a huge text file, encoding all those lines at once might eat up your GPU memory or take forever. Maybe consider batching it for really big files?

Consider adding a batch size limit:

# Embed all lines
-embeddings = embed_model.encode(lines, convert_to_tensor=True, device=DEVICE)
+batch_size = 1000
+if len(lines) > batch_size:
+    embeddings = []
+    for i in range(0, len(lines), batch_size):
+        batch = lines[i:i+batch_size]
+        batch_emb = embed_model.encode(batch, convert_to_tensor=True, device=DEVICE)
+        embeddings.append(batch_emb)
+    embeddings = torch.cat(embeddings, dim=0)
+else:
+    embeddings = embed_model.encode(lines, convert_to_tensor=True, device=DEVICE)

126-135: The return statement should be in an else block

The static analysis tool is right - you could move the successful return into an else block for cleaner flow.

        return {
            "file": filepath,
            "probability": max_prob,
            "copied_to": backup_path,
            "reason": top_lines
        }

    except Exception as e:
        log.error(f"Could not process {filepath}: {e}")
-    return None
+        return None

133-134: The broad exception handling hides important errors

Catching Exception is pretty broad and might hide bugs. At least log what type of error happened so you can debug issues.

-    except Exception as e:
-        log.error(f"Could not process {filepath}: {e}")
+    except (UnicodeDecodeError, OSError) as e:
+        log.error(f"Could not read file {filepath}: {e}")
+    except Exception as e:
+        log.error(f"Unexpected error processing {filepath}: {type(e).__name__}: {e}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d9d1572 and c621162.

📒 Files selected for processing (1)
  • CODE/vulnscan.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-26T15:27:01.170Z
Learnt from: DefinetlyNotAI
PR: DefinetlyNotAI/Logicytics#229
File: CODE/vulnscan.py:187-188
Timestamp: 2025-07-26T15:27:01.170Z
Learning: In CODE/vulnscan.py, the model and vectorizer file paths "vulnscan/Model SenseMini .3n3.pth" and "vulnscan/Vectorizer .3n3.pkl" intentionally include trailing spaces before the file extensions, and these match the actual file names on disk. The files will be renamed in the future to remove the trailing spaces.

Applied to files:

  • CODE/vulnscan.py
🧬 Code graph analysis (1)
CODE/vulnscan.py (1)
CODE/logicytics/Logger.py (5)
  • debug (185-192)
  • error (274-284)
  • info (250-260)
  • newline (236-248)
  • warning (262-272)
🪛 Ruff (0.12.2)
CODE/vulnscan.py

126-131: Consider moving this statement to an else block

(TRY300)


133-133: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (2)
CODE/vulnscan.py (2)

144-145: You're not passing root_dir to process_file

Since process_file needs root_dir as a parameter (as mentioned earlier), you need to update this call too.

-                futures.append(executor.submit(process_file, os.path.join(dirpath, file)))
+                futures.append(executor.submit(process_file, os.path.join(dirpath, file), root))

194-194: Update the main() call to pass ROOT_DIR

Once you fix the main() function signature, you'll need to update this call:

-            main()
+            main(ROOT_DIR)

Fixes for PR (not to include in release):
    Guard os.cpu_count() None to avoid crash.
    Enumerate subkeys without the 1024 cap; use QueryInfoKey.
    Close the registry handle (resource leak) and tighten the call.

Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
Signed-off-by: Shahm Najeeb <Nirt_12023@outlook.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
CODE/usb_history.py (1)

1-4: Windows-only import: add a one-liner note and call it a day.

I know you prefer unguarded imports here. Drop a tiny comment so future you remembers it’s intentional.

 import ctypes
 import os
 import winreg
 from datetime import datetime, timedelta
+
+# Note: Windows-only module; requires winreg/registry access by design.
🧹 Nitpick comments (6)
CODE/usb_history.py (3)

21-22: Don’t swallow everything on write; narrow the except.

Catching Exception here can hide real bugs. Limit to OS/IO errors.

-        except Exception as e:
+        except OSError as e:
             log.error(f"Failed to write history: {e}")

67-69: Same here: narrow the except for friendly-name lookups.

Keep it to OS/registry-related errors.

-        except Exception as e:
+        except (OSError, PermissionError) as e:
             log.warning(f"Failed to read friendly name for {dev_info_path}: {e}")
             return device_id

84-85: And here: avoid blind catch around the whole extraction.

Use a tighter exception set so you don’t mask programming errors.

-        except Exception as e:
+        except (OSError, PermissionError) as e:
             log.error(f"Error during USB history extraction: {e}")
CODE/vulnscan.py (3)

106-112: Speed up: batch predict instead of looping per line.

Way faster and less Python overhead.

-        probs = []
-        for emb in embeddings:
-            with torch.no_grad():
-                output = model(emb.unsqueeze(0))
-                probs.append(torch.sigmoid(output).item())
+        with torch.no_grad():
+            outputs = model(embeddings)
+            probs = torch.sigmoid(outputs).squeeze(-1).tolist()

139-152: Avoid building a giant futures list.

Stream tasks to the pool to cut peak memory. Optional, but nicer for big trees.

-    with ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:
-        futures = []
-        for dirpath, _, filenames in os.walk(root):
-            for file in filenames:
-                futures.append(executor.submit(process_file, os.path.join(dirpath, file)))
-
-        for future in as_completed(futures):
-            result = future.result()
-            if result:
-                sensitive_files.append(result)
+    with ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:
+        def _iter_paths():
+            for dirpath, _, filenames in os.walk(root):
+                for file in filenames:
+                    yield os.path.join(dirpath, file)
+        for result in executor.map(process_file, _iter_paths(), chunksize=32):
+            if result:
+                sensitive_files.append(result)

69-74: PyTorch load is pickle-based; validate the path before loading.

Not a blocker, but add a quick guard so you only load from your expected directory.

-checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
+if not MODEL_PATH.startswith("vulnscan/") and not MODEL_PATH.startswith("vulnscan\\"):
+    raise RuntimeError(f"Refusing to load model from unexpected path: {MODEL_PATH}")
+checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c621162 and bdc9196.

📒 Files selected for processing (3)
  • CODE/config.ini (2 hunks)
  • CODE/usb_history.py (1 hunks)
  • CODE/vulnscan.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-31T16:52:18.725Z
Learnt from: DefinetlyNotAI
PR: DefinetlyNotAI/Logicytics#235
File: CODE/usb_history.py:1-4
Timestamp: 2025-08-31T16:52:18.725Z
Learning: In the usb_history.py module, the user prefers to keep Windows-only imports unguarded despite potential import-time crashes on non-Windows systems, reasoning that it's a Windows-only utility with runtime checks (though no actual Windows platform checks exist in the code).

Applied to files:

  • CODE/usb_history.py
📚 Learning: 2025-07-26T15:27:01.170Z
Learnt from: DefinetlyNotAI
PR: DefinetlyNotAI/Logicytics#229
File: CODE/vulnscan.py:187-188
Timestamp: 2025-07-26T15:27:01.170Z
Learning: In CODE/vulnscan.py, the model and vectorizer file paths "vulnscan/Model SenseMini .3n3.pth" and "vulnscan/Vectorizer .3n3.pkl" intentionally include trailing spaces before the file extensions, and these match the actual file names on disk. The files will be renamed in the future to remove the trailing spaces.

Applied to files:

  • CODE/vulnscan.py
  • CODE/config.ini
📚 Learning: 2025-08-31T16:52:57.528Z
Learnt from: DefinetlyNotAI
PR: DefinetlyNotAI/Logicytics#235
File: CODE/vulnscan.py:83-88
Timestamp: 2025-08-31T16:52:57.528Z
Learning: In the Logicytics codebase, all modules are designed to be standalone and should intentionally break if imported as libraries. Global dependencies like ROOT_DIR in CODE/vulnscan.py are acceptable and intended as part of this design philosophy.

Applied to files:

  • CODE/vulnscan.py
🧬 Code graph analysis (2)
CODE/usb_history.py (1)
CODE/logicytics/Logger.py (3)
  • error (274-284)
  • warning (262-272)
  • info (250-260)
CODE/vulnscan.py (1)
CODE/logicytics/Logger.py (5)
  • debug (185-192)
  • error (274-284)
  • info (250-260)
  • newline (236-248)
  • warning (262-272)
🪛 Ruff (0.12.2)
CODE/usb_history.py

21-21: Do not catch blind exception: Exception

(BLE001)


67-67: Do not catch blind exception: Exception

(BLE001)


84-84: Do not catch blind exception: Exception

(BLE001)

CODE/vulnscan.py

126-131: Consider moving this statement to an else block

(TRY300)


133-133: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
CODE/usb_history.py (2)

24-42: Nice: handle is always closed.

Good use of HANDLE + finally to avoid leaks. 👍


43-57: Solid subkey enumeration.

QueryInfoKey + bounded loop > magic 1024 cap. Clean.

CODE/config.ini (2)

29-30: Version/file list update looks good.

3.6.0 bump and adding encrypted_drive_audit.py + usb_history.py to files is consistent with the PR scope.


104-112: VulnScan section is clearer now.

Good call on the comments and simplified settings. Matches code expectations.

@DefinetlyNotAI DefinetlyNotAI merged commit 0ea294b into main Sep 4, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from Waiting on Review to Done in Issue Board Sep 4, 2025
@DefinetlyNotAI DefinetlyNotAI deleted the v3.6.0 branch September 4, 2025 18:35
DefinetlyNotAI added a commit that referenced this pull request Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request/Important New feature or request, top priority, for next update size/XL Huge size pr type/Code Related to the Code part type/Documentation Improvements or additions to commentations

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants