Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file handles / processes / atime #464

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

file handles / processes / atime #464

wants to merge 12 commits into from

Conversation

0dm
Copy link
Collaborator

@0dm 0dm commented Aug 14, 2023

there are three programs, the names are just temporary -- I didn't know what to call them.

openadapt/external:

  • lm.py (walks through specific directories and collects file access times):
    time (from slack):
    python3 lm.py 0.07s user 0.26s system 47% cpu 0.706 total
    to go through 18622 files

windows only:

@AvidEslami
Copy link
Collaborator

AvidEslami commented Aug 15, 2023

Here is some more timing data for different directories:

    Time 1: 0:00:31.632331, User directory                                       273833  files
    Time 2: 0:00:10.710798, User directory without AppData and .exe files         47010  files
    Time 3: 0:00:00.924593, Documents, Desktop, Downloads directories             13330  files

Time 2 achieving around 10 seconds is probably good enough for the sake of recording file signals. The only problem that remains now is trying to filter out files accessed by background processes. Using stricter whitelists/blacklists we can probably avoid most of these arbitrary files by filetype, but if that doesn't prove to be sufficient then @0dm do you think it would be possible to somehow find the previously opened files of a process by its PID, since we're already able to access the PID of the current window?

Overall this seems promising and I'll see how well it works during a recording!

@0dm
Copy link
Collaborator Author

0dm commented Aug 15, 2023

How many files were in each test? Could you provide the len() of files?

@0dm
Copy link
Collaborator Author

0dm commented Aug 15, 2023

@0dm do you think it would be possible to somehow find the previously opened files of a process by its PID, since we're already able to access the PID of the current window?

For current files I think it might be possible using handle.exe https://learn.microsoft.com/en-ca/sysinternals/downloads/handle ,
I don't think previous files are possible without some kind of monitoring / logging

@AvidEslami
Copy link
Collaborator

For current files I think it might be possible using handle.exe

Yea for sure its just unfortunate that many applications only have the files open momentarily making it unreasonable to catch them with handles, I'll see how well access time does alone 👍

@abrichr
Copy link
Contributor

abrichr commented Mar 2, 2024

@0dm this looks great. Is this ready to merge? Can we add a small test? Any suggestions for mac?

Can you please also document usage, including build comands? 🙏

@abrichr abrichr added the help wanted Extra attention is needed label Mar 2, 2024
@0dm
Copy link
Collaborator Author

0dm commented Mar 3, 2024

@0dm this looks great. Is this ready to merge? Can we add a small test?

There are tests in lm.py, but we do not validate the outputs currently. Testing qsysinfo is a little weird, but I guess one way would be to open a file handle and check stdout to see if the application outputs the related handle.

I'm not sure if it should be merged given that it has not been fully "connected" to OpenAdapt yet, it's more like 2 standalone tools, though we can call get_recent_files internally. For qsysinfo & vwin, we previously talked about using something like pipes to communicate.

Any suggestions for mac?

There's lsof which lists all open files, and we can use lsof -p <pid> to see files open by specific processes.

Can you please also document usage, including build comands? 🙏

Yes, I'll also rename external to tools -- should also probably rename lm.py to something more descriptive, any suggestions? iirc when I made the script I typed two random letters and then said it stood for "last modified" 😅

There's a Makefile that includes all build commands for qsysinfo.c & vwin.c, and precompiled binaries in /bin. I will add a small README for now on how to use.

@abrichr
Copy link
Contributor

abrichr commented Mar 7, 2024

I think we can just name it files.py 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants